Re: Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions

Martijn Visser Mon, 20 Jun 2022 05:13:01 -0700

The Istio guideline implies that this is a guidance, not a standard. Is
that correct? Is there a standard (already)? I think we should follow a
standard as Flink and avoid implementing guidelines from different
vendors/providers.
Op ma 20 jun. 2022 om 13:36 schreef Nathan Fisher <nfis...@junctionbox.ca>:


> Would it make sense to add the annotations to the task manager and job
> manager? In a non-istio environment it’d be a noop.
>
> mTLS as a requirement is more complicated but having some docs around
> using cert-manager might be enough depending on the orgs requirement.
>
> On Mon, Jun 20, 2022 at 06:18, Őrhidi Mátyás <matyas.orh...@gmail.com>
> wrote:
>
>> It seems Istio must be configured to allow Akka cluster communication to
>> bypass the Istio sidecar proxy:
>> https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html
>>
>> On Mon, Jun 20, 2022 at 11:30 AM Sigalit Eliazov <e.siga...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> we have enabled HA as suggested, the task manager tries to reach the job
>>> manager via pod id as expected but
>>> the task manager is unable to connect to the job manager:
>>>
>>>
>>> 2022-06-19 22:14:45,101 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - 
>>> Connecting to ResourceManager akka.tcp://
>>> flink@192.168.3.144:6123/user/rpc/resourcemanager_0(8a98fdb734615089485c685afb0f402d)
>>> .
>>>
>>>
>>> 2022-06-19 22:14:45,242 WARN  akka.remote.transport.netty.NettyTransport    
>>>                [] - Remote connection to [/
>>> 192.168.3.144:6123
>>> ] failed with java.io.IOException: Connection reset by peer
>>>
>>>
>>> 2022-06-19 22:14:45,249 WARN  akka.remote.ReliableDeliverySupervisor        
>>>                [] - Association with remote system [akka.tcp://
>>> flink@192.168.3.144:6123
>>> ] has failed, address is now gated for [50] ms. Reason: [Association failed 
>>> with [akka.tcp://
>>> flink@192.168.3.144:6123
>>> ]] Caused by: [The remote system explicitly disassociated (reason unknown).]
>>>
>>>
>>> 2022-06-19 22:14:45,255 INFO  
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not 
>>> resolve ResourceManager address akka.tcp://
>>> flink@192.168.3.144:6123/user/rpc/resourcemanager_0
>>> , retrying in 10000 ms: Could not connect to rpc endpoint under address 
>>> akka.tcp://
>>> flink@192.168.3.144:6123/user/rpc/resourcemanager_0.
>>>
>>> 2022-06-
>>>
>>>
>>> Are there any additional definitions required for that?
>>>
>>>
>>> thanks
>>>
>>> Sigalit
>>>
>>> On Thu, Jun 16, 2022 at 2:28 PM Yang Wang <danrtsey...@gmail.com> wrote:
>>>
>>>> Could you please have a try with high availability enabled[1]?
>>>>
>>>> If HA enabled, the internal jobmanager rpc service will not be created.
>>>> Instead, the TaskManager retrieves the JobManager address via HA services
>>>> and connects to it via pod ip.
>>>>
>>>> [1].
>>>> https://github.com/apache/flink-kubernetes-operator/blob/main/examples/basic-checkpoint-ha.yaml
>>>>
>>>>
>>>> Best,
>>>> Yang
>>>>
>>>> Elisha, Moshe (Nokia - IL/Kfar Sava) <moshe.eli...@nokia.com>
>>>> 于2022年6月16日周四 15:24写道：
>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>>
>>>>> We are launching Flink deployments using the Flink Kubernetes Operator
>>>>> <https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/>
>>>>> on a Kubernetes cluster with Istio and mTLS enabled.
>>>>>
>>>>>
>>>>>
>>>>> We found that the TaskManager is unable to communicate with the
>>>>> JobManager on the jobmanager-rpc port:
>>>>>
>>>>>
>>>>>
>>>>> 2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor
>>>>>                     [] - Association with remote system
>>>>> [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]
>>>>> has failed, address is now gated for [50] ms. Reason: [Association failed
>>>>> with 
>>>>> [akka.tcp://flink@amf-events-to-inference-and-central.nwdaf-edge:6123]]
>>>>> Caused by: [The remote system explicitly disassociated (reason unknown).]
>>>>>
>>>>>
>>>>>
>>>>> The reason for the issue is that the JobManager service port
>>>>> definitions are not following the Istio guidelines
>>>>> https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/
>>>>> (see example below).
>>>>>
>>>>>
>>>>>
>>>>> We believe a change to the default port definitions is needed but for
>>>>> now, is there an immediate action we can take to work around the issue?
>>>>> Perhaps overriding the default port definitions somehow?
>>>>>
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> flink-kubernetes-operator 1.0.0
>>>>>
>>>>> Flink 1.14-java11
>>>>>
>>>>> Kubernetes v1.19.5
>>>>>
>>>>> Istio 1.7.6
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> # k get service inference-results-to-analytics-engine -o yaml
>>>>>
>>>>> apiVersion: v1
>>>>>
>>>>> kind: Service
>>>>>
>>>>> metadata:
>>>>>
>>>>> ...
>>>>>
>>>>>   labels:
>>>>>
>>>>>     app: inference-results-to-analytics-engine
>>>>>
>>>>>     type: flink-native-kubernetes
>>>>>
>>>>>   name: inference-results-to-analytics-engine
>>>>>
>>>>> spec:
>>>>>
>>>>>   clusterIP: None
>>>>>
>>>>>   ports:
>>>>>
>>>>>   - name: jobmanager-rpc # should start with “tcp-“ or add "
>>>>> appProtocol" property
>>>>>
>>>>>     port: 6123
>>>>>
>>>>>     protocol: TCP
>>>>>
>>>>>     targetPort: 6123
>>>>>
>>>>>   - name: blobserver # should start with "tcp-" or add "appProtocol"
>>>>> property
>>>>>
>>>>>     port: 6124
>>>>>
>>>>>     protocol: TCP
>>>>>
>>>>>     targetPort: 6124
>>>>>
>>>>>   selector:
>>>>>
>>>>>     app: inference-results-to-analytics-engine
>>>>>
>>>>>     component: jobmanager
>>>>>
>>>>>     type: flink-native-kubernetes
>>>>>
>>>>>   sessionAffinity: None
>>>>>
>>>>>   type: ClusterIP
>>>>>
>>>>> status:
>>>>>
>>>>>   loadBalancer: {}
>>>>>
>>>>>
>>>>>
>>>>

Re: Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions

Reply via email to