Re: Do openshift can keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario?

2016-10-12 Thread Jonathan Yu
On Wed, Oct 12, 2016 at 8:42 AM, Ricardo Aguirre Reyes | BEEVA MX <
ricardo.aguirre.contrac...@beeva.com> wrote:

> Hi,
>
> I  have a doubt regarded the  openshift ability to keep Track of
> absolutely all service activity, in a High Availability (many replicas)
> Scenario.
>
>
> We are working in  build a microService that will  communicate through TCP
> (sockets) to Mainframe.
> We will  run several  Pods as replicas in  order t o achieve High
> Availability.
> We know that figuring everything in this way each transaction can be
> logged for the answering pod.
> Then we can store logging messages in elasticSearch and theoretically we
> can get even  dead pods (is this true?); we can  aggregated them based on
> application labels.
>

Yes, things can crash, and you can have situations where a pod is healthy
(according to its health checks
),
accepts a request for processing, and subsequently fails.


> Using multiple pods there will never be messages dropped on the floor
> because at least one pod will be up to answer.
>

It may be useful to think about the failure modes in the path between your
microservices and your mainframe service:

1. Top of rack switch failure
2. Cable failure
3. Power failure
4. Router pod failure
5. OpenShift node failure (application pod failure)

There are a lot of things to consider when building reliable,
high-performance distributed systems. This checklist is helpful:
https://monkey.org/~marius/checklist.pdf

Keep in mind TCP has checksum & retry mechanisms (handling line noise,
dropped packets, transient network blips)  but they do not handle
re-opening a broken connection and re-trying requests automatically.
Therefore, your service will need to handle this somehow.  And there's no
such thing as exactly-once systems
, so your system
should be idempotent.


> But we do not know what happen at example if a message was already
> assigned to the pod1 and then if it goes done before receiveing the reply
> from the Destination.
>

If you open a connection and initiate a request, but your process crashes
before the response is received, then the destination server will send a
reply but your operating system won't know how to handle the request, since
nothing is holding the socket open anymore.  This manifests as a TCP RST
(Reset) being sent to the mainframe.

This wouldn't account for the case where a pod receives a reply but crashes
before completing its processing - for that, you need something more
sophisticated.

Does the openshift High Availability mechanism will resend the last message
> to another available pod, since "it knows" that is down.
>
> The problem is that my service cannot lost any message and it must record
> every activity.
>

I think many people have had success using Apache Kafka
, which is a distributed message queue
(more precisely a replicated commit log). It persists messages for some
defined interval, allowing your application to replay messages in order to
ensure that nothing gets dropped.

-- 
Jonathan Yu, P.Eng. / Software Engineer, OpenShift by Red Hat / Twitter
(@jawnsy) is the quickest way to my heart 

*“A master in the art of living draws no sharp distinction between his work
and his play; his labor and his leisure; his mind and his body; his
education and his recreation. He hardly knows which is which. He simply
pursues his vision of excellence through whatever he is doing, and leaves
others to determine whether he is working or playing. To himself, he always
appears to be doing both.”* — L. P. Jacks, Education through Recreation
(1932), p. 1
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Do openshift can keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario?

2016-10-12 Thread Ricardo Aguirre Reyes | BEEVA MX
Hi,

I  have a doubt regarded the  openshift ability to keep Track of absolutely
all service activity, in a High Availability (many replicas) Scenario.


We are working in  build a microService that will  communicate through TCP
(sockets) to Mainframe.
We will  run several  Pods as replicas in  order t o achieve High
Availability.
We know that figuring everything in this way each transaction can be logged
for the answering pod.
Then we can store logging messages in elasticSearch and theoretically we
can get even  dead pods (is this true?); we can  aggregated them based on
application labels.
Using multiple pods there will never be messages dropped on the floor
because at least one pod will be up to answer.

But we do not know what happen at example if a message was already assigned
to the pod1 and then if it goes done before receiveing the reply from the
Destination.
Does the openshift High Availability mechanism will resend the last message
to another available pod, since "it knows" that is down.

The problem is that my service cannot lost any message and it must record
every activity.

Can someone help me to clarify?

Thank you in advance

-- 


Ricardo Aguirre
Software Engineer
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev