Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-05-27 Thread via GitHub


eladkal closed issue #39236: KubernetesPodOperator duplicating logs when 
interrupted
URL: https://github.com/apache/airflow/issues/39236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-05-26 Thread via GitHub


fdemiane commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2132400038

   I opened a pull request, but I am not really sure if this is the correct way 
to go, as this is a rare occurrence, and logs might get polluted (space 
consumed is minimal, but still). What do you think? (CC: @eladkal)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-05-26 Thread via GitHub


fdemiane commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2132387258

   If we actually look at the logs, the logs that have been duplicated are 
within one second. If we look at the code 
[here](https://github.com/apache/airflow/blob/providers-cncf-kubernetes/7.13.0/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L424),
 we see that read_pod_logs take since_seconds which is in seconds, and is 
passed to 
[_client.read_namespaced_pod_logs](https://github.com/apache/airflow/blob/providers-cncf-kubernetes/7.13.0/airflow/providers/cncf/kubernetes/utils/pod_manager.py#L645)
 (docs 
[here](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#read_namespaced_pod_log))
 which does not support a finer grained time representation.
   
   Also looking at the [Kubernetes API 
reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.30/),
 it doesn't seem to support passing a finer-grained time representation. kubctl 
seem to support passing a since_time which allows passing a timestamp which 
supports milliseconds as seen 
[here](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_logs/#options).
   
   Doing a little search, I found this issue 
[here](https://github.com/kubernetes-client/python/issues/1351) in the distant 
past. 
   
   The **optimal** fix for this issue to to provide a way to support passing a 
since_time in the kubernetes client (out of scope of Airflow), then do the 
necessary code changes in the KPO.
   A **quick win** would be to add a warning message that logs within one 
second might get duplicated (maybe 
[here](airflow/providers/cncf/kubernetes/utils/pod_manager.py)?).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-05-26 Thread via GitHub


eladkal commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2132123404

   Some work around it was done https://github.com/apache/airflow/issues/33498
   cc @fdemiane maybe you will have time to take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-04-27 Thread via GitHub


gbonazzoli commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2080427612

   @raphaelauv 
   
   with version 8.1.1 the problem is still present. It seems that now is 
allways getting "_Pod docker-java-w2ade41b log read interrupted but container 
base still running_"
   
   Airflow's version:
   
   ```bash
   airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ airflow version
   2.9.0
   
   airflow@airflow-test-worker-6cb8744f69-sw7xg:/opt/airflow$ pip list | grep 
kub
   apache-airflow-providers-cncf-kubernetes 8.1.1
   kubernetes   29.0.0
   kubernetes_asyncio   29.0.0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-04-24 Thread via GitHub


tirkarthi commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2075830089

   Related https://github.com/apache/airflow/issues/33498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-04-24 Thread via GitHub


raphaelauv commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2075176634

   could you try the latest version 8.1.1 of 
`apache-airflow-providers-cncf-kubernetes`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-04-24 Thread via GitHub


boring-cyborg[bot] commented on issue #39236:
URL: https://github.com/apache/airflow/issues/39236#issuecomment-2075062319

   Thanks for opening your first issue here! Be sure to follow the issue 
template! If you are willing to raise PR to address this issue please do so, no 
need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] KubernetesPodOperator duplicating logs when interrupted [airflow]

2024-04-24 Thread via GitHub


Nikita-Sobolev opened a new issue, #39236:
URL: https://github.com/apache/airflow/issues/39236

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.8.1
   
   ### What happened?
   
   The KubernetesPodOperator is duplicating tasks's logs two times when `log 
read interrupted but container base still running` they are interrupted. 
Happens randomly on different dags and different runs of the same dag. Assume 
it is somehow connected to the https://github.com/apache/airflow/issues/35019
   
   ### What you think should happen instead?
   
   no logs duplicate
   
   ### How to reproduce
   
   KubernetesPodOperator on cloud AKS cluster
   
   ### Operating System
   
   Ubuntu 22.04
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow==2.8.1
   apache-airflow-providers-amazon==8.16.0
   apache-airflow-providers-celery==3.5.1
   apache-airflow-providers-cncf-kubernetes==7.13.0
   apache-airflow-providers-common-io==1.2.0
   apache-airflow-providers-common-sql==1.10.0
   apache-airflow-providers-docker==3.9.1
   apache-airflow-providers-elasticsearch==5.3.1
   apache-airflow-providers-ftp==3.7.0
   apache-airflow-providers-google==10.13.1
   apache-airflow-providers-grpc==3.4.1
   apache-airflow-providers-hashicorp==3.6.1
   apache-airflow-providers-http==4.8.0
   apache-airflow-providers-imap==3.5.0
   apache-airflow-providers-microsoft-azure==8.5.1
   apache-airflow-providers-mysql==5.5.1
   apache-airflow-providers-odbc==4.4.0
   apache-airflow-providers-openlineage==1.4.0
   apache-airflow-providers-postgres==5.10.0
   apache-airflow-providers-redis==3.6.0
   apache-airflow-providers-sendgrid==3.4.0
   apache-airflow-providers-sftp==4.8.1
   apache-airflow-providers-slack==8.5.1
   apache-airflow-providers-snowflake==5.2.1
   apache-airflow-providers-sqlite==3.7.0
   apache-airflow-providers-ssh==3.10.0
   google-cloud-orchestration-airflow==1.10.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   
![Untitled](https://github.com/apache/airflow/assets/59029283/f558cb03-75c0-4b25-ac8a-ff9f2945ece5)
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org