[jira] [Created] (FLINK-33997) Typo in the doc `classloader.parent-first-patterns-additional`

2024-01-04 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-33997:
-

 Summary: Typo in the doc 
`classloader.parent-first-patterns-additional`
 Key: FLINK-33997
 URL: https://issues.apache.org/jira/browse/FLINK-33997
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: Matyas Orhidi


Typo in the doc:
[https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/debugging/debugging_classloading/#unloading-of-dynamically-loaded-classes-in-user-code]

classloader.parent-first-patterns-additional -> 
classloader.parent-first-patterns.additional



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32690) Report Double.NAN instead of null for missing autoscaler metrics

2023-07-26 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-32690:
-

 Summary: Report Double.NAN instead of null for missing autoscaler 
metrics
 Key: FLINK-32690
 URL: https://issues.apache.org/jira/browse/FLINK-32690
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.7.0


Change null values to Double.NAN for autoscaler metrics during blackout periods 
when no data is gathered. This appears to be a more common practice then null. 
Also consistent with other metrics we have.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32272) Expose LOAD_MAX as autoscaler metric

2023-06-06 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-32272:
-

 Summary: Expose LOAD_MAX as autoscaler metric
 Key: FLINK-32272
 URL: https://issues.apache.org/jira/browse/FLINK-32272
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.6.0


LOAD_MAX is a metric that helps identifying the busiest vertices a.k.a hot 
spots in job graph.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32271) Report RECOMMENDED_PARALLELISM as an autoscaler metric

2023-06-06 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-32271:
-

 Summary: Report RECOMMENDED_PARALLELISM as an autoscaler metric
 Key: FLINK-32271
 URL: https://issues.apache.org/jira/browse/FLINK-32271
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.6.0


It is beneficial to report the recommended parallelism and overlay it with the 
current parallelism on the same chart when auto scaler is running in advisor 
mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31717) Unit tests running with local kube config

2023-04-03 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-31717:
-

 Summary: Unit tests running with local kube config
 Key: FLINK-31717
 URL: https://issues.apache.org/jira/browse/FLINK-31717
 Project: Flink
  Issue Type: New Feature
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


Some unit tests are using local kube environment. This can be dangerous when 
pointing to sensitive clusters e.g. in prod.

{{2023-04-03 12:32:53,956 i.f.k.c.Config [DEBUG] Found for 
Kubernetes config at: [/Users//.kube/config].
}}
A misconfigured kube config environment revealed the issue:

{{[ERROR] Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.012 
s <<< FAILURE! - in org.apache.flink.kubernetes.operator.FlinkOperatorTest
[ERROR] 
org.apache.flink.kubernetes.operator.FlinkOperatorTest.testConfigurationPassedToJOSDK
  Time elapsed: 0.008 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.flink.kubernetes.operator.FlinkOperatorTest.testConfigurationPassedToJOSDK(FlinkOperatorTest.java:63)

[ERROR] 
org.apache.flink.kubernetes.operator.FlinkOperatorTest.testLeaderElectionConfig 
 Time elapsed: 0.004 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.flink.kubernetes.operator.FlinkOperatorTest.testLeaderElectionConfig(FlinkOperatorTest.java:108)
}}

move ~/.kube/config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31611) Add delayed restart to failed jobs

2023-03-24 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-31611:
-

 Summary: Add delayed restart to failed jobs
 Key: FLINK-31611
 URL: https://issues.apache.org/jira/browse/FLINK-31611
 Project: Flink
  Issue Type: New Feature
Reporter: Matyas Orhidi


Operator is able to restart failed jobs already using:
{{kubernetes.operator.job.restart.failed: true}}

It's beneficial however to keep a failed job around for a while for inspection:
{{kubernetes.operator.job.restart.failed.delay: 5m}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30609) Add ephemeral storage to CRD

2023-01-09 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-30609:
-

 Summary: Add ephemeral storage to CRD
 Key: FLINK-30609
 URL: https://issues.apache.org/jira/browse/FLINK-30609
 Project: Flink
  Issue Type: New Feature
Reporter: Matyas Orhidi


We should consider adding ephemeral storage to the existing resource 
specification in CRD, next to `cpu` and `memory`

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#setting-requests-and-limits-for-local-ephemeral-storage



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30330) Exclude .github from source release(s)

2022-12-07 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-30330:
-

 Summary: Exclude .github from source release(s)
 Key: FLINK-30330
 URL: https://issues.apache.org/jira/browse/FLINK-30330
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30157) Trigger Events Before JM Recovery and Unhealthy Job Restarts

2022-11-22 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-30157:
-

 Summary: Trigger Events Before JM Recovery and Unhealthy Job 
Restarts
 Key: FLINK-30157
 URL: https://issues.apache.org/jira/browse/FLINK-30157
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.3.0


We should emit specific events for the following cases:
 * JM recovery
 * Unhealthy Job Restarts



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29744) Throw DeploymentFailedException on ImagePullBackOff

2022-10-24 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29744:
-

 Summary: Throw DeploymentFailedException on ImagePullBackOff
 Key: FLINK-29744
 URL: https://issues.apache.org/jira/browse/FLINK-29744
 Project: Flink
  Issue Type: Improvement
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29619) Remove redundant MeterView updater thread from KubernetesClientMetrics

2022-10-13 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29619:
-

 Summary: Remove redundant MeterView updater thread from 
KubernetesClientMetrics
 Key: FLINK-29619
 URL: https://issues.apache.org/jira/browse/FLINK-29619
 Project: Flink
  Issue Type: Bug
Reporter: Matyas Orhidi


The `MetricRegistryImpl` already has a solution to update `MeterView` objects 
periodically.

https://github.com/apache/flink/blob/7a509c46e45b9a91f2b7d01f13afcdef266b1faf/flink-runtime/src/main/java/org/apache/flink/runtime/metrics/MetricRegistryImpl.java#L404



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29475) Add WARNING/ERROR checker for the operator in e2e tests

2022-09-29 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29475:
-

 Summary: Add WARNING/ERROR checker for the operator in e2e tests
 Key: FLINK-29475
 URL: https://issues.apache.org/jira/browse/FLINK-29475
 Project: Flink
  Issue Type: Improvement
Affects Versions: kubernetes-operator-1.3.0
Reporter: Matyas Orhidi


We can also try eliminating unwanted warnings like:

{{[WARN ] The client is using resource type 'flinkdeployments' with unstable 
version 'v1beta1'}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29474) Name collision: Group already contains a Metric with the name

2022-09-29 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29474:
-

 Summary: Name collision: Group already contains a Metric with the 
name
 Key: FLINK-29474
 URL: https://issues.apache.org/jira/browse/FLINK-29474
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


k create -f examples/basic-session-deployment-and-job.yaml

results in warnings:
{quote} flink-kubernetes-operator 2022-09-29 13:30:00,001 o.a.f.m.MetricGroup   
         [WARN ][default/basic-session-job-example] Name collision: Group 
already contains a Metric with the name  │
│ 'TimeSeconds'. Metric will not be 
reported.[flink-kubernetes-operator-6f9bbfd557-ljp6w, k8soperator, default, 
flink-kubernetes-operator, system, Lifecycle, Transition, Resume]            │
│ flink-kubernetes-operator 2022-09-29 13:30:00,001 o.a.f.m.MetricGroup         
   [WARN ][default/basic-session-job-example] Name collision: Group already 
contains a Metric with the name  │
│ 'TimeSeconds'. Metric will not be 
reported.[flink-kubernetes-operator-6f9bbfd557-ljp6w, k8soperator, default, 
flink-kubernetes-operator, system, Lifecycle, Transition, Upgrade]
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29327) Operator configs are showing up among standard Flink configs

2022-09-16 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29327:
-

 Summary: Operator configs are showing up among standard Flink 
configs
 Key: FLINK-29327
 URL: https://issues.apache.org/jira/browse/FLINK-29327
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29322) Expose savepoint format on Web UI

2022-09-16 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29322:
-

 Summary: Expose savepoint format on Web UI
 Key: FLINK-29322
 URL: https://issues.apache.org/jira/browse/FLINK-29322
 Project: Flink
  Issue Type: New Feature
  Components: Runtime / Web Frontend
Reporter: Matyas Orhidi


Savepoint format is not exposed on the Web UI, thus users should remember how 
they triggered it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29313) Some config overrides are ignored when set under spec.flinkConfiguration

2022-09-15 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29313:
-

 Summary: Some config overrides are ignored when set under 
spec.flinkConfiguration
 Key: FLINK-29313
 URL: https://issues.apache.org/jira/browse/FLINK-29313
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


Some 
[configs|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/#resourceuser-configuration]
 that can be specified under spec.flinkConfiguration won't take affect without 
an upgrade, e.g.:{{{}{}}}
 * {{kubernetes.operator.periodic.savepoint.interval}}
 * {{kubernetes.operator.savepoint.format.type}}

These properties are used mainly from the so called 'observeConfig', and won't 
be available in the operator until the job is restarted. Ideally these should 
be changed without an upgrade, but at the moment they won't take affect at all.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29261) Consider using FAIL_ON_UNKNOWN_PROPERTIES in the Operator

2022-09-12 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29261:
-

 Summary: Consider using FAIL_ON_UNKNOWN_PROPERTIES in the Operator
 Key: FLINK-29261
 URL: https://issues.apache.org/jira/browse/FLINK-29261
 Project: Flink
  Issue Type: Bug
Reporter: Matyas Orhidi


The operator cannot be downgraded, once the CR specification is written to the 
`status`
 
Caused by: com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: 
Unrecognized field "mode" (class 
org.apache.flink.kubernetes.operator.crd.spec.FlinkDeploymentSpec), not marked 
as ignorable (12 known properties: "restartNonce", "imagePullPolicy", 
"ingress", "flinkConfiguration", "serviceAccount", "image", "job", 
"podTemplate", "jobManager", "logConfiguration", "flinkVersion", "taskManager"])
 at [Source: UNKNOWN; byte offset: #UNKNOWN] (through reference chain: 
org.apache.flink.kubernetes.operator.crd.spec.FlinkDeploymentSpec["mode"])
at 
com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
at 
com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:1127)
at 
com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:1989)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1700)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1678)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:319)
at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:176)
at 
com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
at 
com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:4650)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2831)
at 
com.fasterxml.jackson.databind.ObjectMapper.treeToValue(ObjectMapper.java:3295)
at 
org.apache.flink.kubernetes.operator.reconciler.ReconciliationUtils.deserializeSpecWithMeta(ReconciliationUtils.java:288)
... 18 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29251) Send CREATED status and Cancel event via FlinkResourceListener

2022-09-09 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29251:
-

 Summary: Send CREATED status and Cancel event via 
FlinkResourceListener
 Key: FLINK-29251
 URL: https://issues.apache.org/jira/browse/FLINK-29251
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


To complete the lifecycle history of a custom resource the operator should sent:
 * CREATED status notification during initial deployment of a CR
 * Cancel event when deleting a CR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29194) Add LoggingResourceListener as default

2022-09-05 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-29194:
-

 Summary: Add LoggingResourceListener as default
 Key: FLINK-29194
 URL: https://issues.apache.org/jira/browse/FLINK-29194
 Project: Flink
  Issue Type: New Feature
  Components: Deployment / Kubernetes
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


For auditing/debugging purposes the operator needs a way to report the emitted 
events / status updates in the logs:

{{[DEBUG] [default.basic-example] Event  | Info    | SpecChanged     | UPGRADE 
change(s) detected (FlinkDeploymentSpec[image=flink:1.15,restartNonce=] 
differs from FlinkDeploymentSpec[image=flink:1.15asdf,restartNonce=123]), 
starting reconciliation.}}
{{[DEBUG] [default.basic-example] Event  | Info    | Suspended       | 
Suspending existing deployment.}}
{{[DEBUG] [default.basic-example] Status | Info    | UPGRADING       | The 
resource is being upgraded }}
{{[DEBUG] [default.basic-example] Status | Info    | UPGRADING       | The 
resource is being upgraded }}
{{[DEBUG] [default.basic-example] Event  | Info    | Submit          | Starting 
deployment}}
{{[DEBUG] [default.basic-example] Status | Info    | DEPLOYED        | The 
resource is deployed/submitted to Kubernetes, but it’s not yet considered to be 
stable and might be rolled back in the future }}
{{[DEBUG] [default.basic-example] Status | Info    | DEPLOYED        | The 
resource is deployed/submitted to Kubernetes, but it’s not yet considered to be 
stable and might be rolled back in the future }}
{{[DEBUG] [default.basic-example] Event  | Info    | StatusChanged   | Job 
status changed from RECONCILING to CREATED}}
{{[DEBUG] [default.basic-example] Status | Info    | DEPLOYED        | The 
resource is deployed/submitted to Kubernetes, but it’s not yet considered to be 
stable and might be rolled back in the future }}
{{[DEBUG] [default.basic-example] Event  | Info    | StatusChanged   | Job 
status changed from CREATED to RUNNING}}
{{[DEBUG] [default.basic-example] Status | Info    | STABLE          | The 
resource deployment is considered to be stable and won’t be rolled back }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28594) Add metrics for FlinkService

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28594:
-

 Summary: Add metrics for FlinkService
 Key: FLINK-28594
 URL: https://issues.apache.org/jira/browse/FLINK-28594
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


We would need some metrics for the `FlinkService` to be able to tell how long 
does it take to perform most of the blocking operations we have in this service



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28593) Introduce default ingress templates at operator level

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28593:
-

 Summary: Introduce default ingress templates at operator level
 Key: FLINK-28593
 URL: https://issues.apache.org/jira/browse/FLINK-28593
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


Ingress templates are currently [defined at CR 
level|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/ingress/],
 but these rules can be enabled globally at operator level too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28592) Implement custom resource counters as counters not gauges

2022-07-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28592:
-

 Summary: Implement custom resource counters as counters not gauges
 Key: FLINK-28592
 URL: https://issues.apache.org/jira/browse/FLINK-28592
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.2.0


* change to current implementation to counters
 * add counters at global level



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28564) Update NOTICE/LICENCE files for 1.1.0 release

2022-07-15 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28564:
-

 Summary: Update NOTICE/LICENCE files for 1.1.0 release
 Key: FLINK-28564
 URL: https://issues.apache.org/jira/browse/FLINK-28564
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28517) Bump Flink version to 1.15.1

2022-07-12 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28517:
-

 Summary: Bump Flink version to 1.15.1
 Key: FLINK-28517
 URL: https://issues.apache.org/jira/browse/FLINK-28517
 Project: Flink
  Issue Type: Improvement
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28476) Add metrics for Kubernetes API server access

2022-07-09 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28476:
-

 Summary: Add metrics for Kubernetes API server access
 Key: FLINK-28476
 URL: https://issues.apache.org/jira/browse/FLINK-28476
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


e.g.:
 * http response counter
 * http response latency histogram
 * http response status counter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28445) Support dynamic configurations

2022-07-07 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28445:
-

 Summary: Support dynamic configurations
 Key: FLINK-28445
 URL: https://issues.apache.org/jira/browse/FLINK-28445
 Project: Flink
  Issue Type: New Feature
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


It is beneficial in certain scenarios to load operator configurations from 
multiple ConfigMaps. For example the `kubernetes.operator.watched.namespaces` 
is a typical property maintained by control planes and the rest is by the 
Operator. By allowing loading the configuration from multiple ConfigMaps the 
default configuration can be owned by the Operator and other environment 
specific overrides by a control plane. This also allows upgrading the Operator 
independently from control planes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28436) test_multi_sessionjob.sh is failing intermittently

2022-07-07 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28436:
-

 Summary: test_multi_sessionjob.sh is failing intermittently
 Key: FLINK-28436
 URL: https://issues.apache.org/jira/browse/FLINK-28436
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


https://github.com/apache/flink-kubernetes-operator/runs/7222745771?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28389) Correct spec and status updates in FlinkDeploymentControllerTest

2022-07-05 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28389:
-

 Summary: Correct spec and status updates in 
FlinkDeploymentControllerTest
 Key: FLINK-28389
 URL: https://issues.apache.org/jira/browse/FLINK-28389
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


The testing 

`FlinkDeploymentController` we use in the FlinkDeploymentControllerTest mutates 
the FlinkDeployment object. This behaviour is different how it works in real 
environments causing inconsistent behaviour in some tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28331) Persist status after every observe loop

2022-06-30 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28331:
-

 Summary: Persist status after every observe loop
 Key: FLINK-28331
 URL: https://issues.apache.org/jira/browse/FLINK-28331
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


Make sure we don't loose any status information because of the reconcile logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28261) Consider Using Dependent Resources for Ingress

2022-06-27 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28261:
-

 Summary: Consider Using Dependent Resources for Ingress
 Key: FLINK-28261
 URL: https://issues.apache.org/jira/browse/FLINK-28261
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi


JOSDK 3 introduced the concept of dependent resources, which would allow us to 
handle Ingress a more JOSDK native way: see 
[https://javaoperatorsdk.io/docs/dependent-resources#standalone-dependent-resources.]
 This functionality could be a good fit for standalone mode implementation too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28223) Add artifact-fetcher to the pod-template.yaml example

2022-06-23 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28223:
-

 Summary: Add artifact-fetcher to the pod-template.yaml example
 Key: FLINK-28223
 URL: https://issues.apache.org/jira/browse/FLINK-28223
 Project: Flink
  Issue Type: Improvement
Reporter: Matyas Orhidi


We could improve the pod template example to have an artifact fetcher.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28186) Trigger Operator Events on Configuration Changes

2022-06-21 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28186:
-

 Summary: Trigger Operator Events on Configuration Changes
 Key: FLINK-28186
 URL: https://issues.apache.org/jira/browse/FLINK-28186
 Project: Flink
  Issue Type: Improvement
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


The Operator can already emit K8s Events related to CRs it manages, but it 
needs to emit events on important Operator related changes too, e.g. config 
updates, dynamic namespace changes, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28166) Configurable Automatic Retries on Error

2022-06-21 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28166:
-

 Summary: Configurable Automatic Retries on Error
 Key: FLINK-28166
 URL: https://issues.apache.org/jira/browse/FLINK-28166
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


Make automatic reconciliation retries configurable. The current behaviour is 
the default defined in JOSDK: 
https://javaoperatorsdk.io/docs/features#automatic-retries-on-error



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28141) Document Dynamic Namespaces

2022-06-20 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28141:
-

 Summary: Document Dynamic Namespaces
 Key: FLINK-28141
 URL: https://issues.apache.org/jira/browse/FLINK-28141
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28059) Parallelize e2e tests

2022-06-14 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-28059:
-

 Summary: Parallelize e2e tests
 Key: FLINK-28059
 URL: https://issues.apache.org/jira/browse/FLINK-28059
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: Matyas Orhidi


Motivation:
 * Tests are running in a loop within a single step
 * It takes 15mins for the e2e tests to finish
 * We could run 256 parallel tasks instead of the current 6
 * Without looking at the logs it is hard to spot/verify which exact tests are 
running during e2e CI workflows

Suggestions:
 * Let's add the tests into an extra dimension of the test matrix instead of 
looping
 * Try to find a way to share the common steps before/after the tests



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27892) More than 1 secondary resource related to primary

2022-06-03 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27892:
-

 Summary: More than 1 secondary resource related to primary
 Key: FLINK-27892
 URL: https://issues.apache.org/jira/browse/FLINK-27892
 Project: Flink
  Issue Type: Bug
Reporter: Matyas Orhidi


When submitting the `the basic-session-job.yaml' in multiple namespaces:

{{flink-kubernetes-operator java.lang.IllegalStateException: More than 1 
secondary resource related to primary flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.source.ResourceEventSource.getSecondaryResource(ResourceEventSource.java:19)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.api.reconciler.DefaultContext.getSecondaryResource(DefaultContext.java:47)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.api.reconciler.Context.getSecondaryResource(Context.java:15)
 flink-kubernetes-operator at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.validateSessionJob(FlinkSessionJobController.java:135)
 flink-kubernetes-operator at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:91)
 flink-kubernetes-operator at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:51)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:201)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:153)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:152)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:135)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:115)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:86)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:59)
 flink-kubernetes-operator at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:390)
 flink-kubernetes-operator at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
flink-kubernetes-operator at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
flink-kubernetes-operator at java.base/java.lang.Thread.run(Unknown Source)}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27871) Dynamic configuration change is undedected on config removal

2022-06-01 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27871:
-

 Summary: Dynamic configuration change is undedected on config 
removal
 Key: FLINK-27871
 URL: https://issues.apache.org/jira/browse/FLINK-27871
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


The Operator does not detect when a configuration entry is removed from the 
configmap. The equals check in *FlinkConfigManager.updateDefaultConfig* returns 
true incorrectly in this:

 

{{if (newConf.equals(defaultConfig)) {}}
{{LOG.info("Default configuration did not change, nothing to do...");}}
{{return;}}
{{}}}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27812) Support Dynamic change of watched namespaces

2022-05-27 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27812:
-

 Summary: Support Dynamic change of watched namespaces
 Key: FLINK-27812
 URL: https://issues.apache.org/jira/browse/FLINK-27812
 Project: Flink
  Issue Type: Improvement
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27714) Migrate to java-operator-sdk v3

2022-05-20 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27714:
-

 Summary: Migrate to java-operator-sdk v3
 Key: FLINK-27714
 URL: https://issues.apache.org/jira/browse/FLINK-27714
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.1.0


There are a few features planning to add to the operator:
 * Dynamic change of watched namespaces and automatic adjustment of related 
{{EventSources}}
 * Improved Error Handling API

also worth evaluating of:
 * Dependent resources management! See the 
[documentation|https://javaoperatorsdk.io/docs/dependent-resources] for more 
information
 * Support for following a set of namespaces in {{InformerEventSource}} and 
other related improvements.
 * Removal for need of {{PrimaryToSecondaryMapper}} - now handled automatically 
for you

https://github.com/java-operator-sdk/java-operator-sdk/releases/tag/v3.0.0



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27665) Optimise event triggering on DeploymentFailedExceptions

2022-05-17 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27665:
-

 Summary: Optimise event triggering on DeploymentFailedExceptions
 Key: FLINK-27665
 URL: https://issues.apache.org/jira/browse/FLINK-27665
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.0.0
 Attachments: image-2022-05-17-12-08-42-597.png, 
image-2022-05-17-12-13-19-489.png

Use `EventUtils` when handling `DeploymentFailedExceptions` to avoid appending 
new events on every reconcile loop:

!image-2022-05-17-12-13-19-489.png!

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27609) Tracking flink-version and flink-revision in FlinkDeploymentStatus

2022-05-13 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27609:
-

 Summary: Tracking flink-version and flink-revision in 
FlinkDeploymentStatus
 Key: FLINK-27609
 URL: https://issues.apache.org/jira/browse/FLINK-27609
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.0.0


The rest api can provide accurate versioning information through the config 
endpoint:

[https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#config]

The operator should propagate such fields in the status:
 * flink-version
 * flink-revision

This greatly improves the ability to identify malicious Flink versions (CVE 
affected, deprecated, etc.) in managed environments. 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27573) Configuring a new random job result store directory

2022-05-11 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27573:
-

 Summary: Configuring a new random job result store directory
 Key: FLINK-27573
 URL: https://issues.apache.org/jira/browse/FLINK-27573
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


Create a random job result store directory to work around:

https://issues.apache.org/jira/browse/FLINK-27569



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27520) Use admission-controller-framework in Webhook

2022-05-05 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27520:
-

 Summary: Use admission-controller-framework in Webhook
 Key: FLINK-27520
 URL: https://issues.apache.org/jira/browse/FLINK-27520
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: Matyas Orhidi


Use the released 
[https://github.com/java-operator-sdk/admission-controller-framework]

instead of borrowed source codes in the Webhook module.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27468) Observing JobManager deployment. Previous status: MISSING

2022-05-02 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27468:
-

 Summary: Observing JobManager deployment. Previous status: MISSING
 Key: FLINK-27468
 URL: https://issues.apache.org/jira/browse/FLINK-27468
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: Matyas Orhidi


The operator keeps looping if the K8s deployment gets deleted ( and probably 
when the job is in terminal Flink state such as FAILED). We need to agree on 
how to handle such cases and fix it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27190) Revisit error handling in main reconcile() loop

2022-04-11 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27190:
-

 Summary: Revisit error handling in main reconcile() loop
 Key: FLINK-27190
 URL: https://issues.apache.org/jira/browse/FLINK-27190
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi
 Fix For: kubernetes-operator-1.0.0


The are some improvements introduced around error handling:
 * [https://github.com/java-operator-sdk/java-operator-sdk/pull/1033]

 in the upcoming java-operator-sdk release 
[v3.0.0.RC1.|https://github.com/java-operator-sdk/java-operator-sdk/releases/tag]
 We should revisit and simplify further the error logic in 
{{FlinkDeploymentController.reconcile()}}

{{Currently}}
 * checked exceptions are wrapped in runtime exceptions
 * validation errors are terminal errors but handled with differently

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26973) Emit events on state transitions for FlinkDeployment

2022-04-01 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26973:
-

 Summary: Emit events on state transitions for FlinkDeployment
 Key: FLINK-26973
 URL: https://issues.apache.org/jira/browse/FLINK-26973
 Project: Flink
  Issue Type: Improvement
Reporter: Matyas Orhidi


To improve observability we should emit Events during the lifecycle of 
FlinkDeployments



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26953) Introduce Operator Specific Metrics

2022-03-31 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26953:
-

 Summary: Introduce Operator Specific Metrics
 Key: FLINK-26953
 URL: https://issues.apache.org/jira/browse/FLINK-26953
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


Beyond the basic JVM metrics the Operator currently exposes, it could report 
further Operator specific metrics, e.g.:
 * total number of deployments
 * number of active/failed jobs
 * etc.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26916) The Operator Ignores job related changes (jar, parallelism) during last-state upgrades

2022-03-29 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26916:
-

 Summary: The Operator Ignores job related changes (jar, 
parallelism) during last-state upgrades
 Key: FLINK-26916
 URL: https://issues.apache.org/jira/browse/FLINK-26916
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


RC: The old jobgraph is being reused when resuming



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26866) Expired cert during Helm installation

2022-03-25 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26866:
-

 Summary: Expired cert during Helm installation
 Key: FLINK-26866
 URL: https://issues.apache.org/jira/browse/FLINK-26866
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Reporter: Matyas Orhidi


I have a minikube cluster running for a while. Although the cert manager seems 
ok on it and the operator comes up helm installation drops a concerning  error:

{{helm install flink-operator helm/flink-operator}}
{{Error: INSTALLATION FAILED: failed to create resource: Internal error 
occurred: failed calling webhook "webhook.cert-manager.io": failed to call 
webhook: Post 
"https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: 
certificate has expired or is not yet valid: current time 2022-03-25T11:01:46Z 
is after 2022-03-21T08:31:13Z}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26862) Link the Github repository from Operator documentation

2022-03-25 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26862:
-

 Summary: Link the Github repository from Operator documentation
 Key: FLINK-26862
 URL: https://issues.apache.org/jira/browse/FLINK-26862
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26817) Update ingress docs with templating examples

2022-03-23 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26817:
-

 Summary: Update ingress docs with templating examples
 Key: FLINK-26817
 URL: https://issues.apache.org/jira/browse/FLINK-26817
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26765) Document RBAC model

2022-03-21 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26765:
-

 Summary: Document RBAC model
 Key: FLINK-26765
 URL: https://issues.apache.org/jira/browse/FLINK-26765
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26706) Introduce Ingress URL templating

2022-03-17 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26706:
-

 Summary: Introduce Ingress URL templating
 Key: FLINK-26706
 URL: https://issues.apache.org/jira/browse/FLINK-26706
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi


Instead of the current basic `ingressDomain` based approach,  we could 
introduce a more advanced templating mechanism. 

Check the Spark Operator's approach for reference:
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#driver-ui-access-and-ingress

This would eliminate the need for creating `*.example.com` like wildcard DNS 
entries.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26663) Pod augmentation for the operator

2022-03-15 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26663:
-

 Summary: Pod augmentation for the operator
 Key: FLINK-26663
 URL: https://issues.apache.org/jira/browse/FLINK-26663
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi


Currently we provide no convenient way to augment the operator pod itself. It'd 
be great if we could add something similar to the pod templating mechanism used 
in Flink core.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26659) Document UI access via Ingress

2022-03-15 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26659:
-

 Summary: Document UI access via Ingress
 Key: FLINK-26659
 URL: https://issues.apache.org/jira/browse/FLINK-26659
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26637) Document Basic Concepts and Architecture

2022-03-14 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26637:
-

 Summary: Document Basic Concepts and Architecture
 Key: FLINK-26637
 URL: https://issues.apache.org/jira/browse/FLINK-26637
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26546) Extract Observer Interface

2022-03-09 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26546:
-

 Summary: Extract Observer Interface
 Key: FLINK-26546
 URL: https://issues.apache.org/jira/browse/FLINK-26546
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


Similarly to the Reconciler Interface we should extract the Observer interface.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26472) Introduce Savepoint object in JobStatus

2022-03-03 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26472:
-

 Summary: Introduce Savepoint object in JobStatus
 Key: FLINK-26472
 URL: https://issues.apache.org/jira/browse/FLINK-26472
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


We currently store only the `savepointLocation` as a String in the JobState. It 
would be beneficial to introduce a Savepoint object with a few additional 
fields instead:
 * {{String location}}
 * {{String timestamp}}
 * {{boolean success}}
 * {{String error}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26328) Control Logging Behavior in Flink Deployments

2022-02-23 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26328:
-

 Summary: Control Logging Behavior in Flink Deployments
 Key: FLINK-26328
 URL: https://issues.apache.org/jira/browse/FLINK-26328
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi


Looking at 
[https://github.com/spotify/flink-on-k8s-operator/blob/master/docs/user_guide.md#control-logging-behavior]
 

Something similar could work here as well
{quote}The default logging configuration provided by the operator sends logs 
from JobManager and TaskManager to {{{}stdout{}}}. This has the effect of 
making it so that logging from Flink workloads running on Kubernetes behaves 
like every other Kubernetes pod. Your Flink logs should be stored wherever you 
generally expect to see your container logs in your environment.

Sometimes, however, this is not a good fit. An example of when you might want 
to customize logging behavior is to restore the visibility of logs in the Flink 
JobManager web interface. Or you might want to ship logs directly to a 
different sink, or using a different formatter.

You can use the {{spec.logConfig}} field to fully control the log4j and logback 
configuration. It is a string-to-string map, whose keys and values become 
filenames and contents (respectively) in the folder {{/opt/flink/conf}} in each 
container. The default Flink docker entrypoint expects this directory to 
contain two files: {{log4j-console.properties}} and {{{}logback-console.xml{}}}.
{quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26257) Document metrics configuration for Prometheus

2022-02-18 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26257:
-

 Summary: Document metrics configuration for Prometheus
 Key: FLINK-26257
 URL: https://issues.apache.org/jira/browse/FLINK-26257
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26157) Containers Should Not Run As Root

2022-02-15 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-26157:
-

 Summary: Containers Should Not Run As Root
 Key: FLINK-26157
 URL: https://issues.apache.org/jira/browse/FLINK-26157
 Project: Flink
  Issue Type: Sub-task
Reporter: Matyas Orhidi


Processes in a container should not run as root. Create a user in the 
Dockerfile with a known UID:GID (e.g. flink:flink)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-13957) Redact passwords from dynamic properties on job submission

2019-09-04 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-13957:
-

 Summary: Redact passwords from dynamic properties on job submission
 Key: FLINK-13957
 URL: https://issues.apache.org/jira/browse/FLINK-13957
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Affects Versions: 1.9.0
Reporter: Matyas Orhidi
 Fix For: 1.9.1


SSL related passwords specified by dynamic properties are showing up in 
{{FlinkYarnSessionCli}} logs in plain text:

{{19/09/04 04:57:43 INFO cli.FlinkYarnSessionCli: Dynamic Property set: 
security.ssl.internal.truststore-password=changeit}}
{{19/09/04 04:57:43 INFO cli.FlinkYarnSessionCli: Dynamic Property set: 
security.ssl.internal.keystore-password=changeit}}
{{19/09/04 04:57:43 INFO cli.FlinkYarnSessionCli: Dynamic Property set: 
security.ssl.internal.key-password=changeit}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)