Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-27 Thread Daniel Carrasco
Hello,

I just write to say that after more than a week the server still working
without problem and the OSD are not marked as down erroneously. On my tests
the webpage stop working for less than a minute when i stop an OSD, so the
failover is working fine.

Greetings and thanks for all your help!!

2017-06-15 19:04 GMT+02:00 Daniel Carrasco :

> Hello, thanks for the info.
>
> I'll give a try tomorrow. On one of my test I got the messages that yo say
> (wrongfully marked), but i've lowered other options and now is fine. For
> now the OSD are not reporting down messages even with an high load test,
> but I'll see the logs tomorrow to confirm.
>
> The most of time the server is used as RO and the load is not high, so if
> an OSD is marked as down for some seconds is not a big problem (at least I
> think that recovery traffic is low because it only has to check that pgs
> are in both OSD).
>
> Greetings and thanks again!
>
> 2017-06-15 18:13 GMT+02:00 David Turner :
>
>> osd_heartbeat_grace is a setting for how many seconds since the last time
>> an osd received a successful response from another osd before telling the
>> mons that it's down.  This is one you may want to lower from its default
>> value of 20 seconds.
>>
>> mon_osd_min_down_reporters is a setting for how many osds need to report
>> an osd as down before the mons will mark it as down.  I recommend setting
>> this to N+1 where N is how many osds you have in a node or failure domain.
>> If you end up with a network problem and you have 1 osd node that can talk
>> to the mons, but not the other osd nodes, then you will end up with that
>> one node marking the entire cluster down while the rest of the cluster
>> marks that node down. If your min_down_reporters is N+1, then 1 node cannot
>> mark down the rest of the cluster.  The default setting is 1 so that small
>> test clusters can mark down osds, but if you have 3+ nodes, you should set
>> it to N+1 if you can.  Setting it to more than 2 nodes is equally as
>> problematic.  However, if you just want things to report as fast as
>> possible, leaving this at 1 still might be optimal to getting it marked
>> down sooner.
>>
>> The downside to lowering these settings is if OSDs are getting marked
>> down for running slower, then they will re-assert themselves to the mons
>> and end up causing backfilling and peering for no really good reason.
>> You'll want to monitor your cluster for OSDs being marked down for a few
>> seconds before marking themselves back up.  You can see this in the OSD
>> logs where the OSD says it was wrongfully marked down in one line and then
>> the next is where it tells the mons it is actually up.
>>
>> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco 
>> wrote:
>>
>>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD
>>> daemons has started to use only a 5% (about 200MB). Is like magic, and now
>>> I've about 3.2Gb of free RAM.
>>>
>>> Greetings!!
>>>
>>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco :
>>>
 Finally, the problem was W3Total Cache, that seems to be unable to
 manage HA and when the master redis host is down, it stop working without
 try the slave.

 I've added some options to make it faster to detect a down OSD and the
 page is online again in about 40s.

 [global]
 fsid = Hidden
 mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
 mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 osd mon heartbeat interval = 5
 osd mon report interval max = 10
 mon osd report timeout = 15
 osd fast fail on connection refused = True

 public network = 10.20.1.0/24
 osd pool default size = 2


 Greetings and thanks for all your help.

 2017-06-14 23:09 GMT+02:00 David Turner :

> I've used the kernel client and the ceph-fuse driver for mapping the
> cephfs volume.  I didn't notice any network hiccups while failing over, 
> but
> I was reading large files during my tests (and live) and some caching may
> have hidden hidden network hiccups for my use case.
>
> Going back to the memory potentially being a problem.  Ceph has a
> tendency to start using 2-3x more memory while it's in a degraded state as
> opposed to when everything is health_ok.  Always plan for 
> over-provisioning
> your memory to account for a minimum of 2x.  I've seen clusters stuck in 
> an
> OOM killer death spiral because it kept killing OSDs for running out of
> memory, that caused more peering and backfilling, ... which caused more
> OSDs to be killed by OOM killer.
>
> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
> wrote:
>
>> Is strange because on my test cluster 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco
Hello, thanks for the info.

I'll give a try tomorrow. On one of my test I got the messages that yo
say (wrongfully
marked), but i've lowered other options and now is fine. For now the OSD
are not reporting down messages even with an high load test, but I'll see
the logs tomorrow to confirm.

The most of time the server is used as RO and the load is not high, so if
an OSD is marked as down for some seconds is not a big problem (at least I
think that recovery traffic is low because it only has to check that pgs
are in both OSD).

Greetings and thanks again!

2017-06-15 18:13 GMT+02:00 David Turner :

> osd_heartbeat_grace is a setting for how many seconds since the last time
> an osd received a successful response from another osd before telling the
> mons that it's down.  This is one you may want to lower from its default
> value of 20 seconds.
>
> mon_osd_min_down_reporters is a setting for how many osds need to report
> an osd as down before the mons will mark it as down.  I recommend setting
> this to N+1 where N is how many osds you have in a node or failure domain.
> If you end up with a network problem and you have 1 osd node that can talk
> to the mons, but not the other osd nodes, then you will end up with that
> one node marking the entire cluster down while the rest of the cluster
> marks that node down. If your min_down_reporters is N+1, then 1 node cannot
> mark down the rest of the cluster.  The default setting is 1 so that small
> test clusters can mark down osds, but if you have 3+ nodes, you should set
> it to N+1 if you can.  Setting it to more than 2 nodes is equally as
> problematic.  However, if you just want things to report as fast as
> possible, leaving this at 1 still might be optimal to getting it marked
> down sooner.
>
> The downside to lowering these settings is if OSDs are getting marked down
> for running slower, then they will re-assert themselves to the mons and end
> up causing backfilling and peering for no really good reason.  You'll want
> to monitor your cluster for OSDs being marked down for a few seconds before
> marking themselves back up.  You can see this in the OSD logs where the OSD
> says it was wrongfully marked down in one line and then the next is where
> it tells the mons it is actually up.
>
> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco 
> wrote:
>
>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD
>> daemons has started to use only a 5% (about 200MB). Is like magic, and now
>> I've about 3.2Gb of free RAM.
>>
>> Greetings!!
>>
>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco :
>>
>>> Finally, the problem was W3Total Cache, that seems to be unable to
>>> manage HA and when the master redis host is down, it stop working without
>>> try the slave.
>>>
>>> I've added some options to make it faster to detect a down OSD and the
>>> page is online again in about 40s.
>>>
>>> [global]
>>> fsid = Hidden
>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> osd mon heartbeat interval = 5
>>> osd mon report interval max = 10
>>> mon osd report timeout = 15
>>> osd fast fail on connection refused = True
>>>
>>> public network = 10.20.1.0/24
>>> osd pool default size = 2
>>>
>>>
>>> Greetings and thanks for all your help.
>>>
>>> 2017-06-14 23:09 GMT+02:00 David Turner :
>>>
 I've used the kernel client and the ceph-fuse driver for mapping the
 cephfs volume.  I didn't notice any network hiccups while failing over, but
 I was reading large files during my tests (and live) and some caching may
 have hidden hidden network hiccups for my use case.

 Going back to the memory potentially being a problem.  Ceph has a
 tendency to start using 2-3x more memory while it's in a degraded state as
 opposed to when everything is health_ok.  Always plan for over-provisioning
 your memory to account for a minimum of 2x.  I've seen clusters stuck in an
 OOM killer death spiral because it kept killing OSDs for running out of
 memory, that caused more peering and backfilling, ... which caused more
 OSDs to be killed by OOM killer.

 On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
 wrote:

> Is strange because on my test cluster (three nodes) with two nodes
> with OSD, and all with MON and MDS, I've configured the size to 2 and
> min_size to 1, I've restarted all nodes one by one and the client loose 
> the
> connection for about 5 seconds until connect to other MDS.
>
> Are you using ceph client or kernel client?
> I forgot to say that I'm using Debian 8.
>
> Anyway, maybe the problem was what I've said before, the clients
> connection with that node started to fail, but the node 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread David Turner
osd_heartbeat_grace is a setting for how many seconds since the last time
an osd received a successful response from another osd before telling the
mons that it's down.  This is one you may want to lower from its default
value of 20 seconds.

mon_osd_min_down_reporters is a setting for how many osds need to report an
osd as down before the mons will mark it as down.  I recommend setting this
to N+1 where N is how many osds you have in a node or failure domain.  If
you end up with a network problem and you have 1 osd node that can talk to
the mons, but not the other osd nodes, then you will end up with that one
node marking the entire cluster down while the rest of the cluster marks
that node down. If your min_down_reporters is N+1, then 1 node cannot mark
down the rest of the cluster.  The default setting is 1 so that small test
clusters can mark down osds, but if you have 3+ nodes, you should set it to
N+1 if you can.  Setting it to more than 2 nodes is equally as
problematic.  However, if you just want things to report as fast as
possible, leaving this at 1 still might be optimal to getting it marked
down sooner.

The downside to lowering these settings is if OSDs are getting marked down
for running slower, then they will re-assert themselves to the mons and end
up causing backfilling and peering for no really good reason.  You'll want
to monitor your cluster for OSDs being marked down for a few seconds before
marking themselves back up.  You can see this in the OSD logs where the OSD
says it was wrongfully marked down in one line and then the next is where
it tells the mons it is actually up.

On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco 
wrote:

> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD daemons
> has started to use only a 5% (about 200MB). Is like magic, and now I've
> about 3.2Gb of free RAM.
>
> Greetings!!
>
> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco :
>
>> Finally, the problem was W3Total Cache, that seems to be unable to manage
>> HA and when the master redis host is down, it stop working without try the
>> slave.
>>
>> I've added some options to make it faster to detect a down OSD and the
>> page is online again in about 40s.
>>
>> [global]
>> fsid = Hidden
>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> osd mon heartbeat interval = 5
>> osd mon report interval max = 10
>> mon osd report timeout = 15
>> osd fast fail on connection refused = True
>>
>> public network = 10.20.1.0/24
>> osd pool default size = 2
>>
>>
>> Greetings and thanks for all your help.
>>
>> 2017-06-14 23:09 GMT+02:00 David Turner :
>>
>>> I've used the kernel client and the ceph-fuse driver for mapping the
>>> cephfs volume.  I didn't notice any network hiccups while failing over, but
>>> I was reading large files during my tests (and live) and some caching may
>>> have hidden hidden network hiccups for my use case.
>>>
>>> Going back to the memory potentially being a problem.  Ceph has a
>>> tendency to start using 2-3x more memory while it's in a degraded state as
>>> opposed to when everything is health_ok.  Always plan for over-provisioning
>>> your memory to account for a minimum of 2x.  I've seen clusters stuck in an
>>> OOM killer death spiral because it kept killing OSDs for running out of
>>> memory, that caused more peering and backfilling, ... which caused more
>>> OSDs to be killed by OOM killer.
>>>
>>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
>>> wrote:
>>>
 Is strange because on my test cluster (three nodes) with two nodes with
 OSD, and all with MON and MDS, I've configured the size to 2 and min_size
 to 1, I've restarted all nodes one by one and the client loose the
 connection for about 5 seconds until connect to other MDS.

 Are you using ceph client or kernel client?
 I forgot to say that I'm using Debian 8.

 Anyway, maybe the problem was what I've said before, the clients
 connection with that node started to fail, but the node was not officially
 down. And it wasn't a client problem, because it happened on both clients
 and on my monitoring service at same time.

 Just now I'm not on the office, so I can't post the config file.
 Tomorrow I'll send it.
 Anyway, is the basic file generated by ceph-deploy with client network
 and min_size configurations. Just like my test config.

 Thanks!!, and greetings!!

 El 14 jun. 2017 10:38 p. m., "David Turner" 
 escribió:

 I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1
 at a time to do ceph and kernel upgrades.  The VM's running out of ceph,
 the clients accessing MDS, etc all keep working fine without any problem

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco
I forgot to say that after upgrade the machine RAM to 4Gb, the OSD daemons
has started to use only a 5% (about 200MB). Is like magic, and now I've
about 3.2Gb of free RAM.

Greetings!!

2017-06-15 15:08 GMT+02:00 Daniel Carrasco :

> Finally, the problem was W3Total Cache, that seems to be unable to manage
> HA and when the master redis host is down, it stop working without try the
> slave.
>
> I've added some options to make it faster to detect a down OSD and the
> page is online again in about 40s.
>
> [global]
> fsid = Hidden
> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd mon heartbeat interval = 5
> osd mon report interval max = 10
> mon osd report timeout = 15
> osd fast fail on connection refused = True
>
> public network = 10.20.1.0/24
> osd pool default size = 2
>
>
> Greetings and thanks for all your help.
>
> 2017-06-14 23:09 GMT+02:00 David Turner :
>
>> I've used the kernel client and the ceph-fuse driver for mapping the
>> cephfs volume.  I didn't notice any network hiccups while failing over, but
>> I was reading large files during my tests (and live) and some caching may
>> have hidden hidden network hiccups for my use case.
>>
>> Going back to the memory potentially being a problem.  Ceph has a
>> tendency to start using 2-3x more memory while it's in a degraded state as
>> opposed to when everything is health_ok.  Always plan for over-provisioning
>> your memory to account for a minimum of 2x.  I've seen clusters stuck in an
>> OOM killer death spiral because it kept killing OSDs for running out of
>> memory, that caused more peering and backfilling, ... which caused more
>> OSDs to be killed by OOM killer.
>>
>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
>> wrote:
>>
>>> Is strange because on my test cluster (three nodes) with two nodes with
>>> OSD, and all with MON and MDS, I've configured the size to 2 and min_size
>>> to 1, I've restarted all nodes one by one and the client loose the
>>> connection for about 5 seconds until connect to other MDS.
>>>
>>> Are you using ceph client or kernel client?
>>> I forgot to say that I'm using Debian 8.
>>>
>>> Anyway, maybe the problem was what I've said before, the clients
>>> connection with that node started to fail, but the node was not officially
>>> down. And it wasn't a client problem, because it happened on both clients
>>> and on my monitoring service at same time.
>>>
>>> Just now I'm not on the office, so I can't post the config file.
>>> Tomorrow I'll send it.
>>> Anyway, is the basic file generated by ceph-deploy with client network
>>> and min_size configurations. Just like my test config.
>>>
>>> Thanks!!, and greetings!!
>>>
>>> El 14 jun. 2017 10:38 p. m., "David Turner" 
>>> escribió:
>>>
>>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at
>>> a time to do ceph and kernel upgrades.  The VM's running out of ceph, the
>>> clients accessing MDS, etc all keep working fine without any problem during
>>> these restarts.  What is your full ceph configuration?  There must be
>>> something not quite right in there.
>>>
>>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco 
>>> wrote:
>>>


 El 14 jun. 2017 10:08 p. m., "David Turner" 
 escribió:

 Not just the min_size of your cephfs data pool, but also your
 cephfs_metadata pool.


 Both were at 1. I don't know why because I don't remember to have
 changed the min_size and the cluster has 3 odd from beginning (I did
 it on another cluster for testing purposes, but I don't remember to have
 changed on this). I've changed both to two, but after the fail.

 About the size, I use 50Gb because it's for a single webpage and I
 don't need more space.

 I'll try to increase the memory to 3Gb.

 Greetings!!


 On Wed, Jun 14, 2017 at 4:07 PM David Turner 
 wrote:

> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes
> are definitely on the low end.  50GB OSDs... I don't know what that will
> require, but where you're running the mon and mds on the same node, I'd
> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
> surprising, even at that size.
>
> When you say you increased the size of the pools to 3, what did you do
> to the min_size?  Is that still set to 2?
>
> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
> wrote:
>
>> Finally I've created three nodes, I've increased the size of pools to
>> 3 and I've created 3 MDS (active, standby, standby).
>>
>> Today the server has decided to fail and I've noticed that 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco
Finally, the problem was W3Total Cache, that seems to be unable to manage
HA and when the master redis host is down, it stop working without try the
slave.

I've added some options to make it faster to detect a down OSD and the page
is online again in about 40s.

[global]
fsid = Hidden
mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd mon heartbeat interval = 5
osd mon report interval max = 10
mon osd report timeout = 15
osd fast fail on connection refused = True

public network = 10.20.1.0/24
osd pool default size = 2


Greetings and thanks for all your help.

2017-06-14 23:09 GMT+02:00 David Turner :

> I've used the kernel client and the ceph-fuse driver for mapping the
> cephfs volume.  I didn't notice any network hiccups while failing over, but
> I was reading large files during my tests (and live) and some caching may
> have hidden hidden network hiccups for my use case.
>
> Going back to the memory potentially being a problem.  Ceph has a tendency
> to start using 2-3x more memory while it's in a degraded state as opposed
> to when everything is health_ok.  Always plan for over-provisioning your
> memory to account for a minimum of 2x.  I've seen clusters stuck in an OOM
> killer death spiral because it kept killing OSDs for running out of memory,
> that caused more peering and backfilling, ... which caused more OSDs to be
> killed by OOM killer.
>
> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
> wrote:
>
>> Is strange because on my test cluster (three nodes) with two nodes with
>> OSD, and all with MON and MDS, I've configured the size to 2 and min_size
>> to 1, I've restarted all nodes one by one and the client loose the
>> connection for about 5 seconds until connect to other MDS.
>>
>> Are you using ceph client or kernel client?
>> I forgot to say that I'm using Debian 8.
>>
>> Anyway, maybe the problem was what I've said before, the clients
>> connection with that node started to fail, but the node was not officially
>> down. And it wasn't a client problem, because it happened on both clients
>> and on my monitoring service at same time.
>>
>> Just now I'm not on the office, so I can't post the config file. Tomorrow
>> I'll send it.
>> Anyway, is the basic file generated by ceph-deploy with client network
>> and min_size configurations. Just like my test config.
>>
>> Thanks!!, and greetings!!
>>
>> El 14 jun. 2017 10:38 p. m., "David Turner" 
>> escribió:
>>
>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at
>> a time to do ceph and kernel upgrades.  The VM's running out of ceph, the
>> clients accessing MDS, etc all keep working fine without any problem during
>> these restarts.  What is your full ceph configuration?  There must be
>> something not quite right in there.
>>
>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco 
>> wrote:
>>
>>>
>>>
>>> El 14 jun. 2017 10:08 p. m., "David Turner" 
>>> escribió:
>>>
>>> Not just the min_size of your cephfs data pool, but also your
>>> cephfs_metadata pool.
>>>
>>>
>>> Both were at 1. I don't know why because I don't remember to have
>>> changed the min_size and the cluster has 3 odd from beginning (I did it
>>> on another cluster for testing purposes, but I don't remember to have
>>> changed on this). I've changed both to two, but after the fail.
>>>
>>> About the size, I use 50Gb because it's for a single webpage and I don't
>>> need more space.
>>>
>>> I'll try to increase the memory to 3Gb.
>>>
>>> Greetings!!
>>>
>>>
>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner 
>>> wrote:
>>>
 Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes
 are definitely on the low end.  50GB OSDs... I don't know what that will
 require, but where you're running the mon and mds on the same node, I'd
 still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
 surprising, even at that size.

 When you say you increased the size of the pools to 3, what did you do
 to the min_size?  Is that still set to 2?

 On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
 wrote:

> Finally I've created three nodes, I've increased the size of pools to
> 3 and I've created 3 MDS (active, standby, standby).
>
> Today the server has decided to fail and I've noticed that failover is
> not working... The ceph -s command shows like everything was OK but the
> clients weren't able to connect and I had to restart the failing node and
> reconect the clients manually to make it work again (even I think that the
> active MDS was in another node).
>
> I don't know if maybe is because the server was not fully down, and
> only some 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread David Turner
I've used the kernel client and the ceph-fuse driver for mapping the cephfs
volume.  I didn't notice any network hiccups while failing over, but I was
reading large files during my tests (and live) and some caching may have
hidden hidden network hiccups for my use case.

Going back to the memory potentially being a problem.  Ceph has a tendency
to start using 2-3x more memory while it's in a degraded state as opposed
to when everything is health_ok.  Always plan for over-provisioning your
memory to account for a minimum of 2x.  I've seen clusters stuck in an OOM
killer death spiral because it kept killing OSDs for running out of memory,
that caused more peering and backfilling, ... which caused more OSDs to be
killed by OOM killer.

On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco 
wrote:

> Is strange because on my test cluster (three nodes) with two nodes with
> OSD, and all with MON and MDS, I've configured the size to 2 and min_size
> to 1, I've restarted all nodes one by one and the client loose the
> connection for about 5 seconds until connect to other MDS.
>
> Are you using ceph client or kernel client?
> I forgot to say that I'm using Debian 8.
>
> Anyway, maybe the problem was what I've said before, the clients
> connection with that node started to fail, but the node was not officially
> down. And it wasn't a client problem, because it happened on both clients
> and on my monitoring service at same time.
>
> Just now I'm not on the office, so I can't post the config file. Tomorrow
> I'll send it.
> Anyway, is the basic file generated by ceph-deploy with client network and
> min_size configurations. Just like my test config.
>
> Thanks!!, and greetings!!
>
> El 14 jun. 2017 10:38 p. m., "David Turner" 
> escribió:
>
> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at a
> time to do ceph and kernel upgrades.  The VM's running out of ceph, the
> clients accessing MDS, etc all keep working fine without any problem during
> these restarts.  What is your full ceph configuration?  There must be
> something not quite right in there.
>
> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco 
> wrote:
>
>>
>>
>> El 14 jun. 2017 10:08 p. m., "David Turner" 
>> escribió:
>>
>> Not just the min_size of your cephfs data pool, but also your
>> cephfs_metadata pool.
>>
>>
>> Both were at 1. I don't know why because I don't remember to have changed
>> the min_size and the cluster has 3 odd from beginning (I did it on
>> another cluster for testing purposes, but I don't remember to have changed
>> on this). I've changed both to two, but after the fail.
>>
>> About the size, I use 50Gb because it's for a single webpage and I don't
>> need more space.
>>
>> I'll try to increase the memory to 3Gb.
>>
>> Greetings!!
>>
>>
>> On Wed, Jun 14, 2017 at 4:07 PM David Turner 
>> wrote:
>>
>>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes
>>> are definitely on the low end.  50GB OSDs... I don't know what that will
>>> require, but where you're running the mon and mds on the same node, I'd
>>> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
>>> surprising, even at that size.
>>>
>>> When you say you increased the size of the pools to 3, what did you do
>>> to the min_size?  Is that still set to 2?
>>>
>>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
>>> wrote:
>>>
 Finally I've created three nodes, I've increased the size of pools to 3
 and I've created 3 MDS (active, standby, standby).

 Today the server has decided to fail and I've noticed that failover is
 not working... The ceph -s command shows like everything was OK but the
 clients weren't able to connect and I had to restart the failing node and
 reconect the clients manually to make it work again (even I think that the
 active MDS was in another node).

 I don't know if maybe is because the server was not fully down, and
 only some connections were failing. I'll do some tests too see.

 Another question: How many memory needs a node to work?, because I've
 nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
 memory usage (more than 1GB on the OSD).
 The OSD size is 50GB and the data that contains is less than 3GB.

 Thanks, and Greetings!!

 2017-06-12 23:33 GMT+02:00 Mazzystr :

> Since your app is an Apache / php app is it possible for you to
> reconfigure the app to use S3 module rather than a posix open file()?  
> Then
> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
> "active-active" endpoints with round robin dns or F5 or something.  You
> would also have to repopulate objects into the rados pools.
>
> Also increase that size parameter to 3.  ;-)
>
> Lots of work for 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco
Is strange because on my test cluster (three nodes) with two nodes with
OSD, and all with MON and MDS, I've configured the size to 2 and min_size
to 1, I've restarted all nodes one by one and the client loose the
connection for about 5 seconds until connect to other MDS.

Are you using ceph client or kernel client?
I forgot to say that I'm using Debian 8.

Anyway, maybe the problem was what I've said before, the clients connection
with that node started to fail, but the node was not officially down. And
it wasn't a client problem, because it happened on both clients and on my
monitoring service at same time.

Just now I'm not on the office, so I can't post the config file. Tomorrow
I'll send it.
Anyway, is the basic file generated by ceph-deploy with client network and
min_size configurations. Just like my test config.

Thanks!!, and greetings!!

El 14 jun. 2017 10:38 p. m., "David Turner" 
escribió:

I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at a
time to do ceph and kernel upgrades.  The VM's running out of ceph, the
clients accessing MDS, etc all keep working fine without any problem during
these restarts.  What is your full ceph configuration?  There must be
something not quite right in there.

On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco 
wrote:

>
>
> El 14 jun. 2017 10:08 p. m., "David Turner" 
> escribió:
>
> Not just the min_size of your cephfs data pool, but also your
> cephfs_metadata pool.
>
>
> Both were at 1. I don't know why because I don't remember to have changed
> the min_size and the cluster has 3 odd from beginning (I did it on
> another cluster for testing purposes, but I don't remember to have changed
> on this). I've changed both to two, but after the fail.
>
> About the size, I use 50Gb because it's for a single webpage and I don't
> need more space.
>
> I'll try to increase the memory to 3Gb.
>
> Greetings!!
>
>
> On Wed, Jun 14, 2017 at 4:07 PM David Turner 
> wrote:
>
>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
>> definitely on the low end.  50GB OSDs... I don't know what that will
>> require, but where you're running the mon and mds on the same node, I'd
>> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
>> surprising, even at that size.
>>
>> When you say you increased the size of the pools to 3, what did you do to
>> the min_size?  Is that still set to 2?
>>
>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
>> wrote:
>>
>>> Finally I've created three nodes, I've increased the size of pools to 3
>>> and I've created 3 MDS (active, standby, standby).
>>>
>>> Today the server has decided to fail and I've noticed that failover is
>>> not working... The ceph -s command shows like everything was OK but the
>>> clients weren't able to connect and I had to restart the failing node and
>>> reconect the clients manually to make it work again (even I think that the
>>> active MDS was in another node).
>>>
>>> I don't know if maybe is because the server was not fully down, and only
>>> some connections were failing. I'll do some tests too see.
>>>
>>> Another question: How many memory needs a node to work?, because I've
>>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>>> memory usage (more than 1GB on the OSD).
>>> The OSD size is 50GB and the data that contains is less than 3GB.
>>>
>>> Thanks, and Greetings!!
>>>
>>> 2017-06-12 23:33 GMT+02:00 Mazzystr :
>>>
 Since your app is an Apache / php app is it possible for you to
 reconfigure the app to use S3 module rather than a posix open file()?  Then
 with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
 "active-active" endpoints with round robin dns or F5 or something.  You
 would also have to repopulate objects into the rados pools.

 Also increase that size parameter to 3.  ;-)

 Lots of work for active-active but the whole stack will be much more
 resilient coming from some with a ClearCase / NFS / stale file handles up
 the wazoo background



 On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco  wrote:

> 2017-06-12 16:10 GMT+02:00 David Turner :
>
>> I have an incredibly light-weight cephfs configuration.  I set up an
>> MDS on each mon (3 total), and have 9TB of data in cephfs.  This data 
>> only
>> has 1 client that reads a few files at a time.  I haven't noticed any
>> downtime when it fails over to a standby MDS.  So it definitely depends 
>> on
>> your workload as to how a failover will affect your environment.
>>
>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
>> wrote:
>>
>>> We use the following in our ceph.conf for MDS failover. We're
>>> running one active and 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread David Turner
I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at a
time to do ceph and kernel upgrades.  The VM's running out of ceph, the
clients accessing MDS, etc all keep working fine without any problem during
these restarts.  What is your full ceph configuration?  There must be
something not quite right in there.

On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco 
wrote:

>
>
> El 14 jun. 2017 10:08 p. m., "David Turner" 
> escribió:
>
> Not just the min_size of your cephfs data pool, but also your
> cephfs_metadata pool.
>
>
> Both were at 1. I don't know why because I don't remember to have changed
> the min_size and the cluster has 3 odd from beginning (I did it on
> another cluster for testing purposes, but I don't remember to have changed
> on this). I've changed both to two, but after the fail.
>
> About the size, I use 50Gb because it's for a single webpage and I don't
> need more space.
>
> I'll try to increase the memory to 3Gb.
>
> Greetings!!
>
>
> On Wed, Jun 14, 2017 at 4:07 PM David Turner 
> wrote:
>
>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
>> definitely on the low end.  50GB OSDs... I don't know what that will
>> require, but where you're running the mon and mds on the same node, I'd
>> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
>> surprising, even at that size.
>>
>> When you say you increased the size of the pools to 3, what did you do to
>> the min_size?  Is that still set to 2?
>>
>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
>> wrote:
>>
>>> Finally I've created three nodes, I've increased the size of pools to 3
>>> and I've created 3 MDS (active, standby, standby).
>>>
>>> Today the server has decided to fail and I've noticed that failover is
>>> not working... The ceph -s command shows like everything was OK but the
>>> clients weren't able to connect and I had to restart the failing node and
>>> reconect the clients manually to make it work again (even I think that the
>>> active MDS was in another node).
>>>
>>> I don't know if maybe is because the server was not fully down, and only
>>> some connections were failing. I'll do some tests too see.
>>>
>>> Another question: How many memory needs a node to work?, because I've
>>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>>> memory usage (more than 1GB on the OSD).
>>> The OSD size is 50GB and the data that contains is less than 3GB.
>>>
>>> Thanks, and Greetings!!
>>>
>>> 2017-06-12 23:33 GMT+02:00 Mazzystr :
>>>
 Since your app is an Apache / php app is it possible for you to
 reconfigure the app to use S3 module rather than a posix open file()?  Then
 with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
 "active-active" endpoints with round robin dns or F5 or something.  You
 would also have to repopulate objects into the rados pools.

 Also increase that size parameter to 3.  ;-)

 Lots of work for active-active but the whole stack will be much more
 resilient coming from some with a ClearCase / NFS / stale file handles up
 the wazoo background



 On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco  wrote:

> 2017-06-12 16:10 GMT+02:00 David Turner :
>
>> I have an incredibly light-weight cephfs configuration.  I set up an
>> MDS on each mon (3 total), and have 9TB of data in cephfs.  This data 
>> only
>> has 1 client that reads a few files at a time.  I haven't noticed any
>> downtime when it fails over to a standby MDS.  So it definitely depends 
>> on
>> your workload as to how a failover will affect your environment.
>>
>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
>> wrote:
>>
>>> We use the following in our ceph.conf for MDS failover. We're
>>> running one active and one standby. Last time it failed over there was
>>> about 2 minutes of downtime before the mounts started responding again 
>>> but
>>> it did recover gracefully.
>>>
>>> [mds]
>>> max_mds = 1
>>> mds_standby_for_rank = 0
>>> mds_standby_replay = true
>>>
>>> ___
>>>
>>> John Petrini
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
> Thanks to both.
> Just now i'm working on that because I needs a very fast failover. For
> now the tests give me a very fast response when an OSD fails (about 5
> seconds), but a very slow response when the main MDS fails (I've not 
> tested
> the real time, but was not working for a long time). Maybe was because I
> created the other MDS after mount, because I've 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco
El 14 jun. 2017 10:08 p. m., "David Turner" 
escribió:

Not just the min_size of your cephfs data pool, but also your
cephfs_metadata pool.


Both were at 1. I don't know why because I don't remember to have changed
the min_size and the cluster has 3 odd from beginning (I did it on another
cluster for testing purposes, but I don't remember to have changed on
this). I've changed both to two, but after the fail.

About the size, I use 50Gb because it's for a single webpage and I don't
need more space.

I'll try to increase the memory to 3Gb.

Greetings!!


On Wed, Jun 14, 2017 at 4:07 PM David Turner  wrote:

> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
> definitely on the low end.  50GB OSDs... I don't know what that will
> require, but where you're running the mon and mds on the same node, I'd
> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
> surprising, even at that size.
>
> When you say you increased the size of the pools to 3, what did you do to
> the min_size?  Is that still set to 2?
>
> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
> wrote:
>
>> Finally I've created three nodes, I've increased the size of pools to 3
>> and I've created 3 MDS (active, standby, standby).
>>
>> Today the server has decided to fail and I've noticed that failover is
>> not working... The ceph -s command shows like everything was OK but the
>> clients weren't able to connect and I had to restart the failing node and
>> reconect the clients manually to make it work again (even I think that the
>> active MDS was in another node).
>>
>> I don't know if maybe is because the server was not fully down, and only
>> some connections were failing. I'll do some tests too see.
>>
>> Another question: How many memory needs a node to work?, because I've
>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>> memory usage (more than 1GB on the OSD).
>> The OSD size is 50GB and the data that contains is less than 3GB.
>>
>> Thanks, and Greetings!!
>>
>> 2017-06-12 23:33 GMT+02:00 Mazzystr :
>>
>>> Since your app is an Apache / php app is it possible for you to
>>> reconfigure the app to use S3 module rather than a posix open file()?  Then
>>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
>>> "active-active" endpoints with round robin dns or F5 or something.  You
>>> would also have to repopulate objects into the rados pools.
>>>
>>> Also increase that size parameter to 3.  ;-)
>>>
>>> Lots of work for active-active but the whole stack will be much more
>>> resilient coming from some with a ClearCase / NFS / stale file handles up
>>> the wazoo background
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco 
>>> wrote:
>>>
 2017-06-12 16:10 GMT+02:00 David Turner :

> I have an incredibly light-weight cephfs configuration.  I set up an
> MDS on each mon (3 total), and have 9TB of data in cephfs.  This data only
> has 1 client that reads a few files at a time.  I haven't noticed any
> downtime when it fails over to a standby MDS.  So it definitely depends on
> your workload as to how a failover will affect your environment.
>
> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
> wrote:
>
>> We use the following in our ceph.conf for MDS failover. We're running
>> one active and one standby. Last time it failed over there was about 2
>> minutes of downtime before the mounts started responding again but it did
>> recover gracefully.
>>
>> [mds]
>> max_mds = 1
>> mds_standby_for_rank = 0
>> mds_standby_replay = true
>>
>> ___
>>
>> John Petrini
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

 Thanks to both.
 Just now i'm working on that because I needs a very fast failover. For
 now the tests give me a very fast response when an OSD fails (about 5
 seconds), but a very slow response when the main MDS fails (I've not tested
 the real time, but was not working for a long time). Maybe was because I
 created the other MDS after mount, because I've done some test just before
 send this email and now looks very fast (i've not noticed the downtime).

 Greetings!!


 --
 _

   Daniel Carrasco Marín
   Ingeniería para la Innovación i2TIC, S.L.
   Tlf:  +34 911 12 32 84 Ext: 223
   www.i2tic.com
 _

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread David Turner
Not just the min_size of your cephfs data pool, but also your
cephfs_metadata pool.

On Wed, Jun 14, 2017 at 4:07 PM David Turner  wrote:

> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
> definitely on the low end.  50GB OSDs... I don't know what that will
> require, but where you're running the mon and mds on the same node, I'd
> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
> surprising, even at that size.
>
> When you say you increased the size of the pools to 3, what did you do to
> the min_size?  Is that still set to 2?
>
> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
> wrote:
>
>> Finally I've created three nodes, I've increased the size of pools to 3
>> and I've created 3 MDS (active, standby, standby).
>>
>> Today the server has decided to fail and I've noticed that failover is
>> not working... The ceph -s command shows like everything was OK but the
>> clients weren't able to connect and I had to restart the failing node and
>> reconect the clients manually to make it work again (even I think that the
>> active MDS was in another node).
>>
>> I don't know if maybe is because the server was not fully down, and only
>> some connections were failing. I'll do some tests too see.
>>
>> Another question: How many memory needs a node to work?, because I've
>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>> memory usage (more than 1GB on the OSD).
>> The OSD size is 50GB and the data that contains is less than 3GB.
>>
>> Thanks, and Greetings!!
>>
>> 2017-06-12 23:33 GMT+02:00 Mazzystr :
>>
>>> Since your app is an Apache / php app is it possible for you to
>>> reconfigure the app to use S3 module rather than a posix open file()?  Then
>>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
>>> "active-active" endpoints with round robin dns or F5 or something.  You
>>> would also have to repopulate objects into the rados pools.
>>>
>>> Also increase that size parameter to 3.  ;-)
>>>
>>> Lots of work for active-active but the whole stack will be much more
>>> resilient coming from some with a ClearCase / NFS / stale file handles up
>>> the wazoo background
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco 
>>> wrote:
>>>
 2017-06-12 16:10 GMT+02:00 David Turner :

> I have an incredibly light-weight cephfs configuration.  I set up an
> MDS on each mon (3 total), and have 9TB of data in cephfs.  This data only
> has 1 client that reads a few files at a time.  I haven't noticed any
> downtime when it fails over to a standby MDS.  So it definitely depends on
> your workload as to how a failover will affect your environment.
>
> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
> wrote:
>
>> We use the following in our ceph.conf for MDS failover. We're running
>> one active and one standby. Last time it failed over there was about 2
>> minutes of downtime before the mounts started responding again but it did
>> recover gracefully.
>>
>> [mds]
>> max_mds = 1
>> mds_standby_for_rank = 0
>> mds_standby_replay = true
>>
>> ___
>>
>> John Petrini
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

 Thanks to both.
 Just now i'm working on that because I needs a very fast failover. For
 now the tests give me a very fast response when an OSD fails (about 5
 seconds), but a very slow response when the main MDS fails (I've not tested
 the real time, but was not working for a long time). Maybe was because I
 created the other MDS after mount, because I've done some test just before
 send this email and now looks very fast (i've not noticed the downtime).

 Greetings!!


 --
 _

   Daniel Carrasco Marín
   Ingeniería para la Innovación i2TIC, S.L.
   Tlf:  +34 911 12 32 84 Ext: 223
   www.i2tic.com
 _

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>
>>
>>
>> --
>> _
>>
>>   Daniel Carrasco Marín
>>   Ingeniería para la Innovación i2TIC, S.L.
>>   Tlf:  +34 911 12 32 84 Ext: 223
>>   www.i2tic.com
>> _
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users 

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread David Turner
Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
definitely on the low end.  50GB OSDs... I don't know what that will
require, but where you're running the mon and mds on the same node, I'd
still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
surprising, even at that size.

When you say you increased the size of the pools to 3, what did you do to
the min_size?  Is that still set to 2?

On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco 
wrote:

> Finally I've created three nodes, I've increased the size of pools to 3
> and I've created 3 MDS (active, standby, standby).
>
> Today the server has decided to fail and I've noticed that failover is not
> working... The ceph -s command shows like everything was OK but the clients
> weren't able to connect and I had to restart the failing node and reconect
> the clients manually to make it work again (even I think that the active
> MDS was in another node).
>
> I don't know if maybe is because the server was not fully down, and only
> some connections were failing. I'll do some tests too see.
>
> Another question: How many memory needs a node to work?, because I've
> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
> memory usage (more than 1GB on the OSD).
> The OSD size is 50GB and the data that contains is less than 3GB.
>
> Thanks, and Greetings!!
>
> 2017-06-12 23:33 GMT+02:00 Mazzystr :
>
>> Since your app is an Apache / php app is it possible for you to
>> reconfigure the app to use S3 module rather than a posix open file()?  Then
>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
>> "active-active" endpoints with round robin dns or F5 or something.  You
>> would also have to repopulate objects into the rados pools.
>>
>> Also increase that size parameter to 3.  ;-)
>>
>> Lots of work for active-active but the whole stack will be much more
>> resilient coming from some with a ClearCase / NFS / stale file handles up
>> the wazoo background
>>
>>
>>
>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco 
>> wrote:
>>
>>> 2017-06-12 16:10 GMT+02:00 David Turner :
>>>
 I have an incredibly light-weight cephfs configuration.  I set up an
 MDS on each mon (3 total), and have 9TB of data in cephfs.  This data only
 has 1 client that reads a few files at a time.  I haven't noticed any
 downtime when it fails over to a standby MDS.  So it definitely depends on
 your workload as to how a failover will affect your environment.

 On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
 wrote:

> We use the following in our ceph.conf for MDS failover. We're running
> one active and one standby. Last time it failed over there was about 2
> minutes of downtime before the mounts started responding again but it did
> recover gracefully.
>
> [mds]
> max_mds = 1
> mds_standby_for_rank = 0
> mds_standby_replay = true
>
> ___
>
> John Petrini
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

>>>
>>> Thanks to both.
>>> Just now i'm working on that because I needs a very fast failover. For
>>> now the tests give me a very fast response when an OSD fails (about 5
>>> seconds), but a very slow response when the main MDS fails (I've not tested
>>> the real time, but was not working for a long time). Maybe was because I
>>> created the other MDS after mount, because I've done some test just before
>>> send this email and now looks very fast (i've not noticed the downtime).
>>>
>>> Greetings!!
>>>
>>>
>>> --
>>> _
>>>
>>>   Daniel Carrasco Marín
>>>   Ingeniería para la Innovación i2TIC, S.L.
>>>   Tlf:  +34 911 12 32 84 Ext: 223
>>>   www.i2tic.com
>>> _
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco
Finally I've created three nodes, I've increased the size of pools to 3 and
I've created 3 MDS (active, standby, standby).

Today the server has decided to fail and I've noticed that failover is not
working... The ceph -s command shows like everything was OK but the clients
weren't able to connect and I had to restart the failing node and reconect
the clients manually to make it work again (even I think that the active
MDS was in another node).

I don't know if maybe is because the server was not fully down, and only
some connections were failing. I'll do some tests too see.

Another question: How many memory needs a node to work?, because I've nodes
with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
memory usage (more than 1GB on the OSD).
The OSD size is 50GB and the data that contains is less than 3GB.

Thanks, and Greetings!!

2017-06-12 23:33 GMT+02:00 Mazzystr :

> Since your app is an Apache / php app is it possible for you to
> reconfigure the app to use S3 module rather than a posix open file()?  Then
> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
> "active-active" endpoints with round robin dns or F5 or something.  You
> would also have to repopulate objects into the rados pools.
>
> Also increase that size parameter to 3.  ;-)
>
> Lots of work for active-active but the whole stack will be much more
> resilient coming from some with a ClearCase / NFS / stale file handles up
> the wazoo background
>
>
>
> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco 
> wrote:
>
>> 2017-06-12 16:10 GMT+02:00 David Turner :
>>
>>> I have an incredibly light-weight cephfs configuration.  I set up an MDS
>>> on each mon (3 total), and have 9TB of data in cephfs.  This data only has
>>> 1 client that reads a few files at a time.  I haven't noticed any downtime
>>> when it fails over to a standby MDS.  So it definitely depends on your
>>> workload as to how a failover will affect your environment.
>>>
>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
>>> wrote:
>>>
 We use the following in our ceph.conf for MDS failover. We're running
 one active and one standby. Last time it failed over there was about 2
 minutes of downtime before the mounts started responding again but it did
 recover gracefully.

 [mds]
 max_mds = 1
 mds_standby_for_rank = 0
 mds_standby_replay = true

 ___

 John Petrini
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>
>> Thanks to both.
>> Just now i'm working on that because I needs a very fast failover. For
>> now the tests give me a very fast response when an OSD fails (about 5
>> seconds), but a very slow response when the main MDS fails (I've not tested
>> the real time, but was not working for a long time). Maybe was because I
>> created the other MDS after mount, because I've done some test just before
>> send this email and now looks very fast (i've not noticed the downtime).
>>
>> Greetings!!
>>
>>
>> --
>> _
>>
>>   Daniel Carrasco Marín
>>   Ingeniería para la Innovación i2TIC, S.L.
>>   Tlf:  +34 911 12 32 84 Ext: 223
>>   www.i2tic.com
>> _
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Mazzystr
Since your app is an Apache / php app is it possible for you to reconfigure
the app to use S3 module rather than a posix open file()?  Then with Ceph
drop CephFS and configure Civetweb S3 gateway?  You can have
"active-active" endpoints with round robin dns or F5 or something.  You
would also have to repopulate objects into the rados pools.

Also increase that size parameter to 3.  ;-)

Lots of work for active-active but the whole stack will be much more
resilient coming from some with a ClearCase / NFS / stale file handles up
the wazoo background



On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco 
wrote:

> 2017-06-12 16:10 GMT+02:00 David Turner :
>
>> I have an incredibly light-weight cephfs configuration.  I set up an MDS
>> on each mon (3 total), and have 9TB of data in cephfs.  This data only has
>> 1 client that reads a few files at a time.  I haven't noticed any downtime
>> when it fails over to a standby MDS.  So it definitely depends on your
>> workload as to how a failover will affect your environment.
>>
>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
>> wrote:
>>
>>> We use the following in our ceph.conf for MDS failover. We're running
>>> one active and one standby. Last time it failed over there was about 2
>>> minutes of downtime before the mounts started responding again but it did
>>> recover gracefully.
>>>
>>> [mds]
>>> max_mds = 1
>>> mds_standby_for_rank = 0
>>> mds_standby_replay = true
>>>
>>> ___
>>>
>>> John Petrini
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
> Thanks to both.
> Just now i'm working on that because I needs a very fast failover. For now
> the tests give me a very fast response when an OSD fails (about 5 seconds),
> but a very slow response when the main MDS fails (I've not tested the real
> time, but was not working for a long time). Maybe was because I created the
> other MDS after mount, because I've done some test just before send this
> email and now looks very fast (i've not noticed the downtime).
>
> Greetings!!
>
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Daniel Carrasco
2017-06-12 16:10 GMT+02:00 David Turner :

> I have an incredibly light-weight cephfs configuration.  I set up an MDS
> on each mon (3 total), and have 9TB of data in cephfs.  This data only has
> 1 client that reads a few files at a time.  I haven't noticed any downtime
> when it fails over to a standby MDS.  So it definitely depends on your
> workload as to how a failover will affect your environment.
>
> On Mon, Jun 12, 2017 at 9:59 AM John Petrini 
> wrote:
>
>> We use the following in our ceph.conf for MDS failover. We're running one
>> active and one standby. Last time it failed over there was about 2 minutes
>> of downtime before the mounts started responding again but it did recover
>> gracefully.
>>
>> [mds]
>> max_mds = 1
>> mds_standby_for_rank = 0
>> mds_standby_replay = true
>>
>> ___
>>
>> John Petrini
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

Thanks to both.
Just now i'm working on that because I needs a very fast failover. For now
the tests give me a very fast response when an OSD fails (about 5 seconds),
but a very slow response when the main MDS fails (I've not tested the real
time, but was not working for a long time). Maybe was because I created the
other MDS after mount, because I've done some test just before send this
email and now looks very fast (i've not noticed the downtime).

Greetings!!


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread David Turner
I have an incredibly light-weight cephfs configuration.  I set up an MDS on
each mon (3 total), and have 9TB of data in cephfs.  This data only has 1
client that reads a few files at a time.  I haven't noticed any downtime
when it fails over to a standby MDS.  So it definitely depends on your
workload as to how a failover will affect your environment.

On Mon, Jun 12, 2017 at 9:59 AM John Petrini  wrote:

> We use the following in our ceph.conf for MDS failover. We're running one
> active and one standby. Last time it failed over there was about 2 minutes
> of downtime before the mounts started responding again but it did recover
> gracefully.
>
> [mds]
> max_mds = 1
> mds_standby_for_rank = 0
> mds_standby_replay = true
>
> ___
>
> John Petrini
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread John Petrini
We use the following in our ceph.conf for MDS failover. We're running one
active and one standby. Last time it failed over there was about 2 minutes
of downtime before the mounts started responding again but it did recover
gracefully.

[mds]
max_mds = 1
mds_standby_for_rank = 0
mds_standby_replay = true

___

John Petrini
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Daniel Carrasco
2017-06-12 10:49 GMT+02:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
>
> On 06/12/2017 10:31 AM, Daniel Carrasco wrote:
>
>> Hello,
>>
>> I'm very new on Ceph, so maybe this question is a noob question.
>>
>> We have an architecture that have some web servers (nginx, php...) with a
>> common File Server through NFS. Of course that is a SPOF, so we want to
>> create a multi FS to avoid future problems.
>>
>> We've already tested GlusterFS, but is very slow reading small files
>> using the oficial client (from 600ms to 1700ms to read the Doc page), and
>> through NFS Ganesha it fails a lot (permissions errors, 404 when the file
>> exists...).
>> The next we're trying is Ceph that looks very well and have a good
>> performance even with small files (near to NFS performance: 90-100ms to
>> 100-120ms), but on some tests that I've done, it stop working when an OSD
>> is down.
>>
>> My test architecture are two servers with one OSD and one MON each, and a
>> third with a MON and an MDS. I've configured the cluster to have two copies
>> of every PG (just like a RAID1) and all looks fine (health OK, three
>> monitors...).
>> My test client also works fine: it connects to the cluster and is able to
>> server the webpage without problems, but my problems comes when an OSD goes
>> down. The cluster detects that is down, it shows like needs more OSD to
>> keep the two copies, designates a new MON and looks like is working, but
>> the client is unable to receive new files until I power on the OSD again
>> (it happens with both OSD).
>>
>> My question is: Is there any way to say Ceph to keep serving files even
>> when an OSD is down?
>>
>
> I assume the data pool is configured with size=2 and min_size=2. This
> means that you need two active replicates to allow I/O to a PG. With one
> OSD down this requirement cannot be met.
>
> You can either:
> - add a third OSD
> - set min_size=1


> The later might be fine for test setup, but do not run this configuration
> in production. NEVER. EVER. Search the mailing list for more details.


Thanks!! , just what I thought, a noob question hehe. Now is working.
I'll search later in list, but looks like is to avoid split brain or
similar.



>
>
>
>
>>
>> My other question is about MDS:
>> Multi-MDS enviorement is stable?, because if I have multiple FS to avoid
>> SPOF and I only can deploy an MDS, then we have a new SPOF...
>> This is to know if maybe i need to use Block Devices pools instead File
>> Server pools.
>>
>
> AFAIK active/active MDS setups are still considered experimental;
> active/standby(-replay) is a supported setup. We currently use one active
> and one standby-replay MDS for our CephFS instance serving several million
> files.
>
> Failover between the MDS works, but might run into problems with a large
> number of open files (each requiring a stat operation). Depending on the
> number of open files failover takes some seconds up to 5-10 minutes in our
> setup.
>

Thanks again for your response,
Is not for performance purporse so an active/standby will be enough. I'll
search about this configuration.

About time, always is better to keep the page down for some seconds instead
wait for an admin to fix it.



>
> Regards,
> Burkhard Linke
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



Greetings!!

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Burkhard Linke

Hi,


On 06/12/2017 10:31 AM, Daniel Carrasco wrote:

Hello,

I'm very new on Ceph, so maybe this question is a noob question.

We have an architecture that have some web servers (nginx, php...) 
with a common File Server through NFS. Of course that is a SPOF, so we 
want to create a multi FS to avoid future problems.


We've already tested GlusterFS, but is very slow reading small files 
using the oficial client (from 600ms to 1700ms to read the Doc page), 
and through NFS Ganesha it fails a lot (permissions errors, 404 when 
the file exists...).
The next we're trying is Ceph that looks very well and have a good 
performance even with small files (near to NFS performance: 90-100ms 
to 100-120ms), but on some tests that I've done, it stop working when 
an OSD is down.


My test architecture are two servers with one OSD and one MON each, 
and a third with a MON and an MDS. I've configured the cluster to have 
two copies of every PG (just like a RAID1) and all looks fine (health 
OK, three monitors...).
My test client also works fine: it connects to the cluster and is able 
to server the webpage without problems, but my problems comes when an 
OSD goes down. The cluster detects that is down, it shows like needs 
more OSD to keep the two copies, designates a new MON and looks like 
is working, but the client is unable to receive new files until I 
power on the OSD again (it happens with both OSD).


My question is: Is there any way to say Ceph to keep serving files 
even when an OSD is down?


I assume the data pool is configured with size=2 and min_size=2. This 
means that you need two active replicates to allow I/O to a PG. With one 
OSD down this requirement cannot be met.


You can either:
- add a third OSD
- set min_size=1

The later might be fine for test setup, but do not run this 
configuration in production. NEVER. EVER. Search the mailing list for 
more details.






My other question is about MDS:
Multi-MDS enviorement is stable?, because if I have multiple FS to 
avoid SPOF and I only can deploy an MDS, then we have a new SPOF...
This is to know if maybe i need to use Block Devices pools instead 
File Server pools.


AFAIK active/active MDS setups are still considered experimental; 
active/standby(-replay) is a supported setup. We currently use one 
active and one standby-replay MDS for our CephFS instance serving 
several million files.


Failover between the MDS works, but might run into problems with a large 
number of open files (each requiring a stat operation). Depending on the 
number of open files failover takes some seconds up to 5-10 minutes in 
our setup.


Regards,
Burkhard Linke


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com