[jira] [Created] (FLINK-33799) Add e2e's for tls enabled operator

2023-12-11 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33799:


 Summary: Add e2e's for tls enabled operator
 Key: FLINK-33799
 URL: https://issues.apache.org/jira/browse/FLINK-33799
 Project: Flink
  Issue Type: Technical Debt
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Tony Garrard
 Fix For: kubernetes-operator-1.8.0


It would be good to create some E2E tests to ensure a tls enabled flink 
operator works, so that we don't break anything in the future



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33673) SizeLimits not being set on emptyDir

2023-11-28 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33673:


 Summary: SizeLimits not being set on emptyDir
 Key: FLINK-33673
 URL: https://issues.apache.org/jira/browse/FLINK-33673
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Tony Garrard
 Fix For: kubernetes-operator-1.8.0


The operator should set a sizeLimit on any emptyDir's it creates. See 
[https://main.kyverno.io/policies/other/a/add-emptydir-sizelimit/add-emptydir-sizelimit/.
 
|https://main.kyverno.io/policies/other/a/add-emptydir-sizelimit/add-emptydir-sizelimit/]

This issue is to set a sizeLimit. The default one in question is for artifacts. 
My initial guess at a setting would be around 512Mb



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33645) Env vars in config not added to Taskmanagers in Standalone mode

2023-11-24 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33645:


 Summary: Env vars in config not added to Taskmanagers in 
Standalone mode
 Key: FLINK-33645
 URL: https://issues.apache.org/jira/browse/FLINK-33645
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.6.1, kubernetes-operator-1.7.0
Reporter: Tony Garrard
 Fix For: kubernetes-operator-1.8.0


When a flink deployment provides env var config to the taskmanager e.g.
containerized.taskmanager.env.MY_ENV_VAR: MY_DATA
 
The operator is not setting the env vars on the Taskmanager pods when running 
in Standalone mode. This is working in Native mode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33634) Add Conditions to Flink CRD's Status field

2023-11-24 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789478#comment-17789478
 ] 

Tony Garrard commented on FLINK-33634:
--

So we would use the io.fabric8.kubernetes.api.model.Condition class. I think 
the first condition type to implement would be _type: Ready_ which we could use 
when FlinkApplication/FlinkSessionJob is fully running and FlinkSessionCluster 
is ready to accept jobs. Later we could add additional types e.g. _type: 
Warning_ to inform the user that for example their FlinkDeployment is using 
emphermal storage. Interesting article about conditions here 
https://maelvls.dev/kubernetes-conditions/

> Add Conditions to Flink CRD's Status field
> --
>
> Key: FLINK-33634
> URL: https://issues.apache.org/jira/browse/FLINK-33634
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Major
>
> From 
> [https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties]
>  it is considered best practice to provide Conditions in the Status of CRD's. 
> Some tooling even expects there to be a Conditions field in the status of a 
> CR. This issue to to propose adding a Conditions field to the CR status
> e.g.
> status:
>     conditions:
>      - lastTransitionTime: '2023-11-23T12:38:51Z'
>        status: 'True'
>        type: Ready



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-24 Thread Tony Garrard (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Garrard closed FLINK-33633.

Resolution: Won't Do

> Automatic creation of RBAC for instances of Flink Deployments
> -
>
> Key: FLINK-33633
> URL: https://issues.apache.org/jira/browse/FLINK-33633
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Not a Priority
>
> Currently users have to manually create RBAC e.g. the flink service account. 
> When operator is watching all namespaces; creation of a FlinkDeployment in a 
> specific namespace may fail if the kube admin has failed to create the 
> required RBAC. To improve usability the operator could be coded to 
> automatically create these rbac resources in the instance namespace if not 
> present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-24 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789395#comment-17789395
 ] 

Tony Garrard commented on FLINK-33633:
--

What's the process of closing issues. I can put in a resolution of Feedback 
Received or Won't Do. Do I also need to specify a version ?

> Automatic creation of RBAC for instances of Flink Deployments
> -
>
> Key: FLINK-33633
> URL: https://issues.apache.org/jira/browse/FLINK-33633
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Not a Priority
>
> Currently users have to manually create RBAC e.g. the flink service account. 
> When operator is watching all namespaces; creation of a FlinkDeployment in a 
> specific namespace may fail if the kube admin has failed to create the 
> required RBAC. To improve usability the operator could be coded to 
> automatically create these rbac resources in the instance namespace if not 
> present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-24 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789393#comment-17789393
 ] 

Tony Garrard commented on FLINK-33633:
--

Completely understand. Thanks for your review

> Automatic creation of RBAC for instances of Flink Deployments
> -
>
> Key: FLINK-33633
> URL: https://issues.apache.org/jira/browse/FLINK-33633
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Not a Priority
>
> Currently users have to manually create RBAC e.g. the flink service account. 
> When operator is watching all namespaces; creation of a FlinkDeployment in a 
> specific namespace may fail if the kube admin has failed to create the 
> required RBAC. To improve usability the operator could be coded to 
> automatically create these rbac resources in the instance namespace if not 
> present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-24 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-33633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789376#comment-17789376
 ] 

Tony Garrard commented on FLINK-33633:
--

Yes I understand about the need to restrict the operators api access. What if 
this was a helm install option that is off by default. The added operator code 
would only be enabled with an env var configured during the helm install ?

> Automatic creation of RBAC for instances of Flink Deployments
> -
>
> Key: FLINK-33633
> URL: https://issues.apache.org/jira/browse/FLINK-33633
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Not a Priority
>
> Currently users have to manually create RBAC e.g. the flink service account. 
> When operator is watching all namespaces; creation of a FlinkDeployment in a 
> specific namespace may fail if the kube admin has failed to create the 
> required RBAC. To improve usability the operator could be coded to 
> automatically create these rbac resources in the instance namespace if not 
> present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33634) Add Conditions to Flink CRD's Status field

2023-11-23 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33634:


 Summary: Add Conditions to Flink CRD's Status field
 Key: FLINK-33634
 URL: https://issues.apache.org/jira/browse/FLINK-33634
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Tony Garrard


>From 
>[https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties]
> it is considered best practice to provide Conditions in the Status of CRD's. 
>Some tooling even expects there to be a Conditions field in the status of a 
>CR. This issue to to propose adding a Conditions field to the CR status

e.g.
status:
    conditions:
     - lastTransitionTime: '2023-11-23T12:38:51Z'
       status: 'True'
       type: Ready



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-23 Thread Tony Garrard (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-33633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Garrard updated FLINK-33633:
-
Priority: Not a Priority  (was: Major)

> Automatic creation of RBAC for instances of Flink Deployments
> -
>
> Key: FLINK-33633
> URL: https://issues.apache.org/jira/browse/FLINK-33633
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.7.0
>Reporter: Tony Garrard
>Priority: Not a Priority
>
> Currently users have to manually create RBAC e.g. the flink service account. 
> When operator is watching all namespaces; creation of a FlinkDeployment in a 
> specific namespace may fail if the kube admin has failed to create the 
> required RBAC. To improve usability the operator could be coded to 
> automatically create these rbac resources in the instance namespace if not 
> present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33633) Automatic creation of RBAC for instances of Flink Deployments

2023-11-23 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33633:


 Summary: Automatic creation of RBAC for instances of Flink 
Deployments
 Key: FLINK-33633
 URL: https://issues.apache.org/jira/browse/FLINK-33633
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Tony Garrard


Currently users have to manually create RBAC e.g. the flink service account. 
When operator is watching all namespaces; creation of a FlinkDeployment in a 
specific namespace may fail if the kube admin has failed to create the required 
RBAC. To improve usability the operator could be coded to automatically create 
these rbac resources in the instance namespace if not present



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33632) Add custom mutator plugin

2023-11-23 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-33632:


 Summary: Add custom mutator plugin
 Key: FLINK-33632
 URL: https://issues.apache.org/jira/browse/FLINK-33632
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.7.0
Reporter: Tony Garrard


Currently users have the ability to provide custom validators to the operator. 
It would be great if we followed the same pattern to provide custom mutators



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31966) Flink Kubernetes operator lacks TLS support

2023-10-23 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778559#comment-17778559
 ] 

Tony Garrard commented on FLINK-31966:
--

[~gyfora] Can you assign me to this issue. I have a PR ready for review

> Flink Kubernetes operator lacks TLS support 
> 
>
> Key: FLINK-31966
> URL: https://issues.apache.org/jira/browse/FLINK-31966
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Adrian Vasiliu
>Priority: Major
>
> *Summary*
> The Flink Kubernetes operator lacks support inside the FlinkDeployment 
> operand for configuring Flink with TLS (both one-way and mutual) for the 
> internal communication between jobmanagers and taskmanagers, and for the 
> external REST endpoint. Although a workaround exists to configure the job and 
> task managers, this breaks the operator and renders it unable to reconcile.
> *Additional information*
>  * The Apache Flink operator supports passing through custom flink 
> configuration to be applied to job and task managers.
>  * If you supply SSL-based properties, the operator can no longer speak to 
> the deployed job manager. The operator is reading the flink conf and using it 
> to create a connection to the job manager REST endpoint, but it uses the 
> truststore file paths within flink-conf.yaml, which are unresolvable from the 
> operator. This leaves the operator hanging in a pending state as it cannot 
> complete a reconcile.
> *Proposal*
> Our proposal is to make changes to the operator code. A simple change exists 
> that would be enough to enable anonymous SSL at the REST endpoint, but more 
> invasive changes would be required to enable full mTLS throughout.
> The simple change to enable anonymous SSL would be for the operator to parse 
> flink-conf and podTemplate to identify the Kubernetes resource that contains 
> the certificate from the job manager keystore and use it inside the 
> operator’s trust store.
> In the case of mutual TLS, further changes are required: the operator would 
> need to generate a certificate signed by the same issuing authority as the 
> job manager’s certificates and then use it in a keystore when challenged by 
> that job manager. We propose that the operator becomes responsible for making 
> CertificateSigningRequests to generate certificates for job manager, task 
> manager and operator. The operator can then coordinate deploying the job and 
> task managers with the correct flink-conf and volume mounts. This would also 
> work for anonymous SSL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31966) Flink Kubernetes operator lacks TLS support

2023-10-11 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774130#comment-17774130
 ] 

Tony Garrard commented on FLINK-31966:
--

Hi [~gyfora] , [~martijnvisser] I did some initial coding of a solution. I 
modified the AbstractFlinkService's getClusterClient method so that it would 
copy the relevent certs from the secret mount and place them into a defined 
directory of /tmp/\{namespace}/\{clusterId} and override the config of the 
cluster client to point to these files. This means that all the rest calls the 
operator makes work well. However, I can't find a way of merging the config the 
operator has and that of the flinkdeployment so they can interoperate and 
currently when the operator runs either the submitApplicationCluster or 
submitSessionCluster the operator emits a stacktrace complaining it can't find 
the relevant file. However after a short time the application or session 
cluster starts up fine and the status of the relevant flink cluster corrects 
itself. 

I currently can't see a way of getting the ssl config to work in the operator 
and in the cluster unless the certs are placed in the same location. Do you 
have any ideas ? My only thought would be to mount an emptydir in the operator 
e.g /flink/certs and then document that the kubernetes.secret mount points be 
defined on the CR so that they would be placed in a unique directory within 
that folder

E.g 

{{kubernetes.secrets: my-ssl-cert-secret:/flink/certs/\{clusterid}}}

{{security.ssl.rest.keystore: /flink/certs/\{clusterid}/keystore.jks}}

 This way, the operator would be able to copy the certs from the secret and 
place them in the same location defined in the config. Do you think this is an 
acceptable approach ?

> Flink Kubernetes operator lacks TLS support 
> 
>
> Key: FLINK-31966
> URL: https://issues.apache.org/jira/browse/FLINK-31966
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Adrian Vasiliu
>Priority: Major
>
> *Summary*
> The Flink Kubernetes operator lacks support inside the FlinkDeployment 
> operand for configuring Flink with TLS (both one-way and mutual) for the 
> internal communication between jobmanagers and taskmanagers, and for the 
> external REST endpoint. Although a workaround exists to configure the job and 
> task managers, this breaks the operator and renders it unable to reconcile.
> *Additional information*
>  * The Apache Flink operator supports passing through custom flink 
> configuration to be applied to job and task managers.
>  * If you supply SSL-based properties, the operator can no longer speak to 
> the deployed job manager. The operator is reading the flink conf and using it 
> to create a connection to the job manager REST endpoint, but it uses the 
> truststore file paths within flink-conf.yaml, which are unresolvable from the 
> operator. This leaves the operator hanging in a pending state as it cannot 
> complete a reconcile.
> *Proposal*
> Our proposal is to make changes to the operator code. A simple change exists 
> that would be enough to enable anonymous SSL at the REST endpoint, but more 
> invasive changes would be required to enable full mTLS throughout.
> The simple change to enable anonymous SSL would be for the operator to parse 
> flink-conf and podTemplate to identify the Kubernetes resource that contains 
> the certificate from the job manager keystore and use it inside the 
> operator’s trust store.
> In the case of mutual TLS, further changes are required: the operator would 
> need to generate a certificate signed by the same issuing authority as the 
> job manager’s certificates and then use it in a keystore when challenged by 
> that job manager. We propose that the operator becomes responsible for making 
> CertificateSigningRequests to generate certificates for job manager, task 
> manager and operator. The operator can then coordinate deploying the job and 
> task managers with the correct flink-conf and volume mounts. This would also 
> work for anonymous SSL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-31966) Flink Kubernetes operator lacks TLS support

2023-09-12 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-31966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764117#comment-17764117
 ] 

Tony Garrard commented on FLINK-31966:
--

Hi [~gyfora] I think I can take this on, but can I discuss the approach before 
I start working on it.

Most users will be placing their certificates into kubernetes secrets (either 
by using cert-manager or creating them manually). We should be able to link the 
kubernetes.secrets property to the path used in the security.ssl sections to 
find the secret used for the rest service. The operator will need to create 
those stores locally onto it's disk (probably into /tmp). Then the relevant 
config when creating the rest client needs to be modified to point to the 
location we've created those files. The locations I've identified that will 
need modifications to the flink config appear to be in AbstractFlinkService's 
getClusterClient. Though there is another section in submitClusterInternal 
where it is using the default 
ClusterClientFactory where we would also somehow need to make changes.
 
Do you think this is the right approach and have a missed any other place where 
the config would need to be changed ?

> Flink Kubernetes operator lacks TLS support 
> 
>
> Key: FLINK-31966
> URL: https://issues.apache.org/jira/browse/FLINK-31966
> Project: Flink
>  Issue Type: New Feature
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.4.0
>Reporter: Adrian Vasiliu
>Priority: Major
>
> *Summary*
> The Flink Kubernetes operator lacks support inside the FlinkDeployment 
> operand for configuring Flink with TLS (both one-way and mutual) for the 
> internal communication between jobmanagers and taskmanagers, and for the 
> external REST endpoint. Although a workaround exists to configure the job and 
> task managers, this breaks the operator and renders it unable to reconcile.
> *Additional information*
>  * The Apache Flink operator supports passing through custom flink 
> configuration to be applied to job and task managers.
>  * If you supply SSL-based properties, the operator can no longer speak to 
> the deployed job manager. The operator is reading the flink conf and using it 
> to create a connection to the job manager REST endpoint, but it uses the 
> truststore file paths within flink-conf.yaml, which are unresolvable from the 
> operator. This leaves the operator hanging in a pending state as it cannot 
> complete a reconcile.
> *Proposal*
> Our proposal is to make changes to the operator code. A simple change exists 
> that would be enough to enable anonymous SSL at the REST endpoint, but more 
> invasive changes would be required to enable full mTLS throughout.
> The simple change to enable anonymous SSL would be for the operator to parse 
> flink-conf and podTemplate to identify the Kubernetes resource that contains 
> the certificate from the job manager keystore and use it inside the 
> operator’s trust store.
> In the case of mutual TLS, further changes are required: the operator would 
> need to generate a certificate signed by the same issuing authority as the 
> job manager’s certificates and then use it in a keystore when challenged by 
> that job manager. We propose that the operator becomes responsible for making 
> CertificateSigningRequests to generate certificates for job manager, task 
> manager and operator. The operator can then coordinate deploying the job and 
> task managers with the correct flink-conf and volume mounts. This would also 
> work for anonymous SSL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-19 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678695#comment-17678695
 ] 

Tony Garrard commented on FLINK-30577:
--

[~gyfora] you can assign the issue to me

> OpenShift FlinkSessionJob artifact write error on non-default namespaces
> 
>
> Key: FLINK-30577
> URL: https://issues.apache.org/jira/browse/FLINK-30577
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.0
>Reporter: James Busche
>Priority: Major
>
> [~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
> filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
> don't allow write to /opt.  
> ```
> org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
> /opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
> ```
> A few ways to solve the problem are:
> 1. Remove the comment on line 34 here in 
> [flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
>  and change it to: /tmp/flink/artifacts
> 2. Append this after line 143 here in 
> [values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
> kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts
> 3.  Changing it in line 142 of 
> [KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
>  like this:
> .defaultValue("/tmp/flink/artifacts") 
> and then rebuilding the operator image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-19 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-30577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678693#comment-17678693
 ] 

Tony Garrard commented on FLINK-30577:
--

So the issue is not just with olm, but also with the helm operator. One simple 
fix would be to add an emptyDir volume mount for /opt/flink/artifacts to the 
helm charts in flink-operator.yaml. Another workaround on openshift would be to 
add the anyuid scc onto the operator install namespace i.e. run 

oc adm policy add-scc-to-group anyuid system:serviceaccounts:

 

 

> OpenShift FlinkSessionJob artifact write error on non-default namespaces
> 
>
> Key: FLINK-30577
> URL: https://issues.apache.org/jira/browse/FLINK-30577
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.3.0
>Reporter: James Busche
>Priority: Major
>
> [~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
> filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
> don't allow write to /opt.  
> ```
> org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
> /opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
> ```
> A few ways to solve the problem are:
> 1. Remove the comment on line 34 here in 
> [flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
>  and change it to: /tmp/flink/artifacts
> 2. Append this after line 143 here in 
> [values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
> kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts
> 3.  Changing it in line 142 of 
> [KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
>  like this:
> .defaultValue("/tmp/flink/artifacts") 
> and then rebuilding the operator image.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29536) Add WATCH_NAMESPACES env var to kubernetes operator

2022-10-31 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626689#comment-17626689
 ] 

Tony Garrard commented on FLINK-29536:
--

Created PR for the required changes

> Add WATCH_NAMESPACES env var to kubernetes operator
> ---
>
> Key: FLINK-29536
> URL: https://issues.apache.org/jira/browse/FLINK-29536
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.2.0
>Reporter: Tony Garrard
>Assignee: Tony Garrard
>Priority: Major
>  Labels: pull-request-available
>
> Provide the ability to set the namespaces watched by the operator using an 
> env var. Whilst the additional config can still be used, the presence of the 
> env var will take priority.
>  
> Reasons for issue
>  # Operator will take effect of the setting immediately as pod will roll 
> (rather than waiting for the config to be refreshed)
>  # If the operator is to be olm bundled we will be able to set the target 
> namespace using the following 
> {{env:}}
>   {{  - name: WATCHED_NAMESPACE}}
>   {{valueFrom:}}
>   {{  fieldRef:}}
>  {{fieldPath: 
> metadata.annotations['olm.targetNamespaces']}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29536) Add WATCH_NAMESPACES env var to kubernetes operator

2022-10-25 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623970#comment-17623970
 ] 

Tony Garrard commented on FLINK-29536:
--

[~gyfora] can you assign this to me thanks

> Add WATCH_NAMESPACES env var to kubernetes operator
> ---
>
> Key: FLINK-29536
> URL: https://issues.apache.org/jira/browse/FLINK-29536
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.2.0
>Reporter: Tony Garrard
>Priority: Major
>
> Provide the ability to set the namespaces watched by the operator using an 
> env var. Whilst the additional config can still be used, the presence of the 
> env var will take priority.
>  
> Reasons for issue
>  # Operator will take effect of the setting immediately as pod will roll 
> (rather than waiting for the config to be refreshed)
>  # If the operator is to be olm bundled we will be able to set the target 
> namespace using the following 
> {{env:}}
>   {{  - name: WATCHED_NAMESPACE}}
>   {{valueFrom:}}
>   {{  fieldRef:}}
>  {{fieldPath: 
> metadata.annotations['olm.targetNamespaces']}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29536) Add WATCH_NAMESPACES env var to kubernetes operator

2022-10-07 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-29536:


 Summary: Add WATCH_NAMESPACES env var to kubernetes operator
 Key: FLINK-29536
 URL: https://issues.apache.org/jira/browse/FLINK-29536
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.0
Reporter: Tony Garrard
 Fix For: kubernetes-operator-1.2.0


Provide the ability to set the namespaces watched by the operator using an env 
var. Whilst the additional config can still be used, the presence of the env 
var will take priority.

 

Reasons for issue
 # Operator will take effect of the setting immediately as pod will roll 
(rather than waiting for the config to be refreshed)
 # If the operator is to be olm bundled we will be able to set the target 
namespace using the following 

{{env:}}

  {{  - name: WATCHED_NAMESPACE}}

  {{valueFrom:}}

  {{  fieldRef:}}

 {{fieldPath: 
metadata.annotations['olm.targetNamespaces']}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-29283) Remove hardcoded apiVersion from operator unit test

2022-09-14 Thread Tony Garrard (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-29283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604758#comment-17604758
 ] 

Tony Garrard commented on FLINK-29283:
--

Absolutely, assign it to me

> Remove hardcoded apiVersion from operator unit test
> ---
>
> Key: FLINK-29283
> URL: https://issues.apache.org/jira/browse/FLINK-29283
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Affects Versions: kubernetes-operator-1.1.0
>Reporter: Tony Garrard
>Priority: Minor
>  Labels: pull-request-available
> Fix For: kubernetes-operator-1.2.0
>
>
> The unit test 
> flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/utils/ReconciliationUtilsTest.java
>  has a hardcoded apiVersion. To facilitate modifications, it should be using 
> the constants provided in the class CrdConstants i.e. 
> assertEquals(API_GROUP + "/" + API_VERSION, 
> internalMeta.get("apiVersion").asText());
> instead of "flink.apache.org/v1beta1"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29283) Remove hardcoded apiVersion from operator unit test

2022-09-13 Thread Tony Garrard (Jira)
Tony Garrard created FLINK-29283:


 Summary: Remove hardcoded apiVersion from operator unit test
 Key: FLINK-29283
 URL: https://issues.apache.org/jira/browse/FLINK-29283
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Tony Garrard
 Fix For: kubernetes-operator-1.2.0


The unit test 
flink-kubernetes-operator/src/test/java/org/apache/flink/kubernetes/operator/utils/ReconciliationUtilsTest.java
 has a hardcoded apiVersion. To facilitate modifications, it should be using 
the constants provided in the class CrdConstants i.e. 

assertEquals(API_GROUP + "/" + API_VERSION, 
internalMeta.get("apiVersion").asText());

instead of "flink.apache.org/v1beta1"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)