Re: [I] White INFO log is hard to read [airflow]

2024-04-08 Thread via GitHub


tirkarthi commented on issue #38844:
URL: https://github.com/apache/airflow/issues/38844#issuecomment-2044202208

   Does the logs returned from API have ansi code that causes this? 
   
   cc: @jscheffl 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] clear a deferred task do not increment the tries [airflow]

2024-04-08 Thread via GitHub


tirkarthi commented on issue #38735:
URL: https://github.com/apache/airflow/issues/38735#issuecomment-2044200109

   When a task is deferred the `_try_number` is decremented so that when the 
trigger yields an event the task resumes execution where `_try_number` is 
incremented to use the same try_number before and after trigger. Perhaps when 
the task being cleared is in deferred state the try_number can be incremented.
   
   ```diff
   commit f08bc8d1326f2c7c271928d74804050c36b4e953
   Author: Karthikeyan Singaravelan 
   Date:   Tue Apr 9 11:10:06 2024 +0530
   
   Increment try_number for cleared deferred tasks.
   
   diff --git a/airflow/models/taskinstance.py b/airflow/models/taskinstance.py
   index d52a71c5b2..62b4d3e821 100644
   --- a/airflow/models/taskinstance.py
   +++ b/airflow/models/taskinstance.py
   @@ -274,6 +274,12 @@ def clear_task_instances(
ti.state = TaskInstanceState.RESTARTING
job_ids.append(ti.job_id)
else:
   +# When the task is deferred the try_number is decremented so 
that the same try
   +# number is used when the task handles the event. But in case 
of clearing the try
   +# number should be incremented so that the next run doesn't 
reuse the same try
   +if ti.state == TaskInstanceState.DEFERRED:
   +ti._try_number += 1
   +
ti_dag = dag if dag and dag.dag_id == ti.dag_id else 
dag_bag.get_dag(ti.dag_id, session=session)
task_id = ti.task_id
if ti_dag and ti_dag.has_task(task_id):
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] timeout on Dynamic Task mapping : skipped inner task , task status is still success [airflow]

2024-04-08 Thread via GitHub


raphaelauv commented on issue #37332:
URL: https://github.com/apache/airflow/issues/37332#issuecomment-2044184401

   @ephraimbuddy could you please remo6the tag wait for response , thanks 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Make _get_ti compatible with RPC [airflow]

2024-04-08 Thread via GitHub


dstandish commented on code in PR #38570:
URL: https://github.com/apache/airflow/pull/38570#discussion_r1556927388


##
airflow/serialization/pydantic/dag_run.py:
##
@@ -78,7 +78,7 @@ def get_task_instances(
 def get_task_instance(
 self,
 task_id: str,
-session: Session,
+session: Session | None = None,

Review Comment:
   i think this may be unnecessary.  will revert and see what happens.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Make _get_ti compatible with RPC [airflow]

2024-04-08 Thread via GitHub


dstandish commented on code in PR #38570:
URL: https://github.com/apache/airflow/pull/38570#discussion_r1556922160


##
airflow/cli/commands/task_command.py:
##
@@ -156,8 +156,10 @@ def _get_dag_run(
 raise ValueError(f"unknown create_if_necessary value: 
{create_if_necessary!r}")
 
 
+@internal_api_call
 @provide_session
-def _get_ti(
+def _get_ti_db_access(
+dag: DAG,
 task: Operator,

Review Comment:
   ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Make _get_ti compatible with RPC [airflow]

2024-04-08 Thread via GitHub


dstandish commented on code in PR #38570:
URL: https://github.com/apache/airflow/pull/38570#discussion_r1556918483


##
airflow/serialization/serialized_objects.py:
##
@@ -1462,9 +1462,15 @@ def deserialize_dag(cls, encoded_dag: dict[str, Any]) -> 
SerializedDAG:
 v = set(v)
 elif k == "tasks":
 SerializedBaseOperator._load_operator_extra_links = 
cls._load_operator_extra_links
-
-v = {task["task_id"]: 
SerializedBaseOperator.deserialize_operator(task) for task in v}
+tasks = {}
+for obj in v:
+if obj.get(Encoding.TYPE) == DAT.OP:
+deser = 
SerializedBaseOperator.deserialize_operator(obj[Encoding.VAR])
+tasks[deser.task_id] = deser
+else:  # this is backcompat for pre-2.10
+tasks[obj["task_id"]] = 
SerializedBaseOperator.deserialize_operator(obj)

Review Comment:
   for better or worse, default setting is to _not_ reserialize...
   
   
https://github.com/apache/airflow/blob/fecc1ed8eff5192818bbe04cbbdfe9585eaab583/airflow/cli/cli_config.py#L646-L653



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Unable to find Spark Support in helm chart [airflow]

2024-04-08 Thread via GitHub


boring-cyborg[bot] commented on issue #38851:
URL: https://github.com/apache/airflow/issues/38851#issuecomment-2044123456

   Thanks for opening your first issue here! Be sure to follow the issue 
template! If you are willing to raise PR to address this issue please do so, no 
need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] Unable to find Spark Support in helm chart [airflow]

2024-04-08 Thread via GitHub


anandlingaraj opened a new issue, #38851:
URL: https://github.com/apache/airflow/issues/38851

   ### Apache Airflow version
   
   2.9.0
   
   ### If "Other Airflow 2 version" selected, which one?
   
   apache/airflow:2.8.3
   
   ### What happened?
   
   When selecting connection type for Spark, its missing.
   
   ### What you think should happen instead?
   
   Expect Spark Connections to be default.
   
   ### How to reproduce
   
   Install Apche Spark in AKS/ MiniKube. Try to add Connections and select 
Spark.
   
   ### Operating System
   
   Docker
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: try002 for provider cncf kubernetes [airflow]

2024-04-08 Thread via GitHub


eladkal commented on PR #38799:
URL: https://github.com/apache/airflow/pull/38799#issuecomment-2044119678

   There are conflics to resolve


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add four unit tests for aws/utils [airflow]

2024-04-08 Thread via GitHub


slycyberguy commented on code in PR #38820:
URL: https://github.com/apache/airflow/pull/38820#discussion_r1556846390


##
tests/providers/amazon/aws/utils/test_sagemaker.py:
##


Review Comment:
   I removed those tests. I figured they were necessary because those services 
were listed in issue #35442



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


uranusjr commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1556841592


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   I think what Ash means is we take the Range header in the API, and forward 
it to the log server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add the deferrable mode to the Dataflow sensors [airflow]

2024-04-08 Thread via GitHub


Lee-W commented on code in PR #37693:
URL: https://github.com/apache/airflow/pull/37693#discussion_r1556794107


##
airflow/providers/google/cloud/sensors/dataflow.py:
##
@@ -194,27 +244,72 @@ def poke(self, context: Context) -> bool:
 project_id=self.project_id,
 location=self.location,
 )
+return result["metrics"] if self.callback is None else 
self.callback(result["metrics"])
+
+def execute(self, context: Context) -> Any:
+"""Airflow runs this method on the worker and defers using the 
trigger."""
+if not self.deferrable:
+super().execute(context)
+else:
+self.defer(
+timeout=self.execution_timeout,
+trigger=DataflowJobMetricsTrigger(
+job_id=self.job_id,
+project_id=self.project_id,
+location=self.location,
+gcp_conn_id=self.gcp_conn_id,
+poll_sleep=self.poll_interval,
+impersonation_chain=self.impersonation_chain,
+fail_on_terminal_state=self.fail_on_terminal_state,
+),
+method_name="execute_complete",
+)
 
-return self.callback(result["metrics"])
+def execute_complete(self, context: Context, event: dict[str, str | list]) 
-> Any:
+"""
+Execute this method when the task resumes its execution on the worker 
after deferral.
+
+If the trigger returns an event with success status - passes the event 
result to the callback function.
+Returns the event result if no callback function is provided.
+
+If the trigger returns an event with error status - raises an 
exception.
+"""
+if event["status"] == "success":
+self.log.info(event["message"])
+return event["result"] if self.callback is None else 
self.callback(event["result"])
+if self.soft_fail:

Review Comment:
   ```suggestion
   # TODO: remove this if block when min_airflow_version is set to 
higher than 2.7.1
   if self.soft_fail:
   ```



##
airflow/providers/google/cloud/sensors/dataflow.py:
##
@@ -115,10 +124,50 @@ def poke(self, context: Context) -> bool:
 
 return False
 
+def execute(self, context: Context) -> None:
+"""Airflow runs this method on the worker and defers using the 
trigger."""
+if not self.deferrable:
+super().execute(context)
+elif not self.poke(context=context):
+self.defer(
+timeout=self.execution_timeout,
+trigger=DataflowJobStatusTrigger(
+job_id=self.job_id,
+expected_statuses=self.expected_statuses,
+project_id=self.project_id,
+location=self.location,
+gcp_conn_id=self.gcp_conn_id,
+poll_sleep=self.poll_interval,
+impersonation_chain=self.impersonation_chain,
+),
+method_name="execute_complete",
+)
+
+def execute_complete(self, context: Context, event: dict[str, str | list]) 
-> bool:
+"""
+Execute this method when the task resumes its execution on the worker 
after deferral.
+
+Returns True if the trigger returns an event with the success status, 
otherwise raises
+an exception.
+"""
+if event["status"] == "success":
+self.log.info(event["message"])
+return True
+if self.soft_fail:

Review Comment:
   ```suggestion
   # TODO: remove this if block when min_airflow_version is set to 
higher than 2.7.1
   if self.soft_fail:
   ```



##
airflow/providers/google/cloud/sensors/dataflow.py:
##
@@ -357,4 +498,45 @@ def poke(self, context: Context) -> bool:
 location=self.location,
 )
 
-return self.callback(result)
+return result if self.callback is None else self.callback(result)
+
+def execute(self, context: Context) -> Any:
+"""Airflow runs this method on the worker and defers using the 
trigger."""
+if not self.deferrable:
+super().execute(context)
+else:
+self.defer(
+trigger=DataflowJobAutoScalingEventTrigger(
+job_id=self.job_id,
+project_id=self.project_id,
+location=self.location,
+gcp_conn_id=self.gcp_conn_id,
+poll_sleep=self.poll_interval,
+impersonation_chain=self.impersonation_chain,
+fail_on_terminal_state=self.fail_on_terminal_state,
+),
+method_name="execute_complete",
+)
+
+def execute_complete(self, context: Context, event: dict[str, str | list]) 
-> Any:
+"""
+Execute this 

Re: [PR] Always use the executemany method when inserting rows in DbApiHook as it's way much faster [airflow]

2024-04-08 Thread via GitHub


uranusjr commented on code in PR #38715:
URL: https://github.com/apache/airflow/pull/38715#discussion_r1556827871


##
airflow/providers/common/sql/hooks/sql.py:
##
@@ -166,6 +175,7 @@ def __init__(self, *args, schema: str | None = None, 
log_sql: bool = True, **kwa
 # Hook deriving from the DBApiHook to still have access to the field 
in its constructor
 self.__schema = schema
 self.log_sql = log_sql
+self._fast_executemany = fast_executemany

Review Comment:
   Should this also be set on the class instead? (`supports_fast_executemany`?) 
Or does this depend on some information that’s only available at runtime?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add the deferrable mode to the Dataflow sensors [airflow]

2024-04-08 Thread via GitHub


Lee-W commented on code in PR #37693:
URL: https://github.com/apache/airflow/pull/37693#discussion_r1556763479


##
airflow/providers/google/cloud/sensors/dataflow.py:
##
@@ -151,12 +205,14 @@ def __init__(
 self,
 *,
 job_id: str,
-callback: Callable[[dict], bool],
+callback: Callable | None = None,

Review Comment:
   Got it. If it can be arbitrary Callable, this change makes sense. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deferrable mode for Custom Training Job operators [airflow]

2024-04-08 Thread via GitHub


Lee-W commented on code in PR #38584:
URL: https://github.com/apache/airflow/pull/38584#discussion_r1556745491


##
airflow/providers/google/cloud/operators/vertex_ai/custom_job.py:
##
@@ -539,6 +564,94 @@ def on_kill(self) -> None:
 if self.hook:
 self.hook.cancel_job()
 
+def execute_complete(self, context: Context, event: dict[str, Any]) -> 
dict[str, Any] | None:
+if event["status"] == "error":
+raise AirflowException(event["message"])
+result = event["job"]
+model_id = self.hook.extract_model_id_from_training_pipeline(result)
+custom_job_id = 
self.hook.extract_custom_job_id_from_training_pipeline(result)
+self.xcom_push(context, key="model_id", value=model_id)
+VertexAIModelLink.persist(context=context, task_instance=self, 
model_id=model_id)
+# push custom_job_id to xcom so it could be pulled by other tasks
+self.xcom_push(context, key="custom_job_id", value=custom_job_id)
+return result
+
+def invoke_defer(self, context: Context) -> None:
+custom_container_training_job_obj: CustomContainerTrainingJob = 
self.hook.submit_custom_container_training_job(
+project_id=self.project_id,
+region=self.region,
+display_name=self.display_name,
+command=self.command,
+container_uri=self.container_uri,
+
model_serving_container_image_uri=self.model_serving_container_image_uri,
+
model_serving_container_predict_route=self.model_serving_container_predict_route,
+
model_serving_container_health_route=self.model_serving_container_health_route,
+
model_serving_container_command=self.model_serving_container_command,
+model_serving_container_args=self.model_serving_container_args,
+
model_serving_container_environment_variables=self.model_serving_container_environment_variables,
+model_serving_container_ports=self.model_serving_container_ports,
+model_description=self.model_description,
+model_instance_schema_uri=self.model_instance_schema_uri,
+model_parameters_schema_uri=self.model_parameters_schema_uri,
+model_prediction_schema_uri=self.model_prediction_schema_uri,
+parent_model=self.parent_model,
+is_default_version=self.is_default_version,
+model_version_aliases=self.model_version_aliases,
+model_version_description=self.model_version_description,
+labels=self.labels,
+
training_encryption_spec_key_name=self.training_encryption_spec_key_name,
+model_encryption_spec_key_name=self.model_encryption_spec_key_name,
+staging_bucket=self.staging_bucket,
+# RUN
+dataset=Dataset(name=self.dataset_id) if self.dataset_id else None,
+annotation_schema_uri=self.annotation_schema_uri,
+model_display_name=self.model_display_name,
+model_labels=self.model_labels,
+base_output_dir=self.base_output_dir,
+service_account=self.service_account,
+network=self.network,
+bigquery_destination=self.bigquery_destination,
+args=self.args,
+environment_variables=self.environment_variables,
+replica_count=self.replica_count,
+machine_type=self.machine_type,
+accelerator_type=self.accelerator_type,
+accelerator_count=self.accelerator_count,
+boot_disk_type=self.boot_disk_type,
+boot_disk_size_gb=self.boot_disk_size_gb,
+training_fraction_split=self.training_fraction_split,
+validation_fraction_split=self.validation_fraction_split,
+test_fraction_split=self.test_fraction_split,
+training_filter_split=self.training_filter_split,
+validation_filter_split=self.validation_filter_split,
+test_filter_split=self.test_filter_split,
+predefined_split_column_name=self.predefined_split_column_name,
+timestamp_split_column_name=self.timestamp_split_column_name,
+tensorboard=self.tensorboard,
+)
+custom_container_training_job_obj.wait_for_resource_creation()

Review Comment:
   Yep, sounds good!



##
tests/providers/google/cloud/operators/test_vertex_ai.py:
##
@@ -88,11 +88,18 @@
 ListPipelineJobOperator,
 RunPipelineJobOperator,
 )
-from airflow.providers.google.cloud.triggers.vertex_ai import 
RunPipelineJobTrigger
+from airflow.providers.google.cloud.triggers.vertex_ai import (
+CustomContainerTrainingJobTrigger,
+CustomPythonPackageTrainingJobTrigger,
+CustomTrainingJobTrigger,
+RunPipelineJobTrigger,
+)
 from airflow.utils import timezone
 
 VERTEX_AI_PATH = "airflow.providers.google.cloud.operators.vertex_ai.{}"
 VERTEX_AI_LINKS_PATH = 

Re: [PR] Fix AttributeError: 'DagRunNote' object has no attribute 'dag_id' [airflow]

2024-04-08 Thread via GitHub


MyLong commented on code in PR #38843:
URL: https://github.com/apache/airflow/pull/38843#discussion_r1556746373


##
airflow/models/dagrun.py:
##
@@ -1648,7 +1648,5 @@ def __init__(self, content, user_id=None):
 self.user_id = user_id
 
 def __repr__(self):
-prefix = f"<{self.__class__.__name__}: {self.dag_id}.{self.dagrun_id} 
{self.run_id}"
-if self.map_index != -1:
-prefix += f" map_index={self.map_index}"
+prefix = f"<{self.__class__.__name__}: 
{self.dag_run_id}.{self.content}"

Review Comment:
   ok,I will try it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] timeout on Dynamic Task mapping : skipped inner task , task status is still success [airflow]

2024-04-08 Thread via GitHub


github-actions[bot] commented on issue #37332:
URL: https://github.com/apache/airflow/issues/37332#issuecomment-2043917725

   This issue has been automatically marked as stale because it has been open 
for 14 days with no response from the author. It will be closed in next 7 days 
if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Make ImapHook mail body encoding configurable [airflow]

2024-04-08 Thread via GitHub


github-actions[bot] commented on PR #37517:
URL: https://github.com/apache/airflow/pull/37517#issuecomment-2043917704

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]

2024-04-08 Thread via GitHub


don1uppa commented on issue #38848:
URL: https://github.com/apache/airflow/issues/38848#issuecomment-2043872862

   Thank you the literal option worked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix AttributeError: 'DagRunNote' object has no attribute 'dag_id' [airflow]

2024-04-08 Thread via GitHub


hussein-awala commented on code in PR #38843:
URL: https://github.com/apache/airflow/pull/38843#discussion_r1556607661


##
airflow/models/dagrun.py:
##
@@ -1648,7 +1648,5 @@ def __init__(self, content, user_id=None):
 self.user_id = user_id
 
 def __repr__(self):
-prefix = f"<{self.__class__.__name__}: {self.dag_id}.{self.dagrun_id} 
{self.run_id}"
-if self.map_index != -1:
-prefix += f" map_index={self.map_index}"
+prefix = f"<{self.__class__.__name__}: 
{self.dag_run_id}.{self.content}"

Review Comment:
   This information is not sufficient, we can use the `dag_run` relationship to 
fetch the dag_id and the run_id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Amazon Bedrock - Model Throughput Provisioning [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38850:
URL: https://github.com/apache/airflow/pull/38850#discussion_r1556503642


##
airflow/providers/amazon/aws/operators/bedrock.py:
##
@@ -25,7 +25,10 @@
 from airflow.exceptions import AirflowException

Review Comment:
   This comment is just a safety gate to make sure this PR doesn't get merged 
out of order; anyone can feel free to resolve it once 
https://github.com/apache/airflow/pull/38849 is merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Amazon Bedrock - Model Throughput Provisioning [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38850:
URL: https://github.com/apache/airflow/pull/38850#discussion_r1556503642


##
airflow/providers/amazon/aws/operators/bedrock.py:
##
@@ -25,7 +25,10 @@
 from airflow.exceptions import AirflowException

Review Comment:
   This comment is just a safety gate to make sure this one doesn't get merged 
out of order; anyone can feel free to resolve it once 
https://github.com/apache/airflow/pull/38849 is merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Amazon Bedrock - Model Throughput Provisioning [airflow]

2024-04-08 Thread via GitHub


ferruzzi opened a new pull request, #38850:
URL: https://github.com/apache/airflow/pull/38850

   Adds support for provisioning model throughput on Amazon Bedrock including: 
operators, sensors, waiters, and triggerer, as well as unit tests and system 
tests for all of the above, and doc updates.
   
   Manually tested in Breeze with the Trigger and with the Sensor:
   
![image](https://github.com/apache/airflow/assets/1920178/97573595-935a-43e9-a0ad-be223d6d53b9)
   
   
   Sits on top of a change in https://github.com/apache/airflow/pull/38849, 
please make sure that is merged before this one.
   
   
   
   
   
   
   
   
   
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Amazon Bedrock - Clean up hook unit tests [airflow]

2024-04-08 Thread via GitHub


ferruzzi opened a new pull request, #38849:
URL: https://github.com/apache/airflow/pull/38849

   @vincbeck  and @Taragolis requested some changes to the Bedrock hooks in 
https://github.com/apache/airflow/pull/38693 and I missed the opportunity to 
greatly simplify the related unit tests.
   
   Static checks for it are currently failing locally but it looks like an 
issue with `tests/providers/common/sql/test_utils.py:45:` and it looks like 
@Taragolis  and @dondaum are maybe working on that?
   
   
   
   
   
   
   
   
   
   ---
   **^ Add meaningful description above**
   Read the **[Pull Request 
Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)**
 for more information.
   In case of fundamental code changes, an Airflow Improvement Proposal 
([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals))
 is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party 
License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a 
newsfragment file, named `{pr_number}.significant.rst` or 
`{issue_number}.significant.rst`, in 
[newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Default credentials cause warnings on Google Chrome after logging in to Airflow's UI [airflow]

2024-04-08 Thread via GitHub


Taragolis commented on issue #38797:
URL: https://github.com/apache/airflow/issues/38797#issuecomment-2043738184

   > but unfortunately, Chrome is not an open source, so I'm not sure what the 
address is in this case.
   
   AFAIK, Chrome bugs could be reported/tracked on Chromium bug tracker: 
https://issues.chromium.org/issues
   
   There is also some close related bugs already opened: 
https://issues.chromium.org/issues?q=status:open%20data%20breach
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow-site) branch gh-pages updated (613224189f -> 21f20bf658)

2024-04-08 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


 discard 613224189f Rewritten history to remove past gh-pages deployments
 new 21f20bf658 Rewritten history to remove past gh-pages deployments

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (613224189f)
\
 N -- N -- N   refs/heads/gh-pages (21f20bf658)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 blog/airflow-1.10.10/index.html|   4 +-
 blog/airflow-1.10.12/index.html|   4 +-
 blog/airflow-1.10.8-1.10.9/index.html  |   4 +-
 blog/airflow-2.2.0/index.html  |   4 +-
 blog/airflow-2.3.0/index.html  |   4 +-
 blog/airflow-2.4.0/index.html  |   4 +-
 blog/airflow-2.5.0/index.html  |   4 +-
 blog/airflow-2.6.0/index.html  |   4 +-
 blog/airflow-2.7.0/index.html  |   4 +-
 blog/airflow-2.8.0/index.html  |   4 +-
 .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes
 blog/airflow-2.9.0/index.html  |   6 +-
 blog/airflow-survey-2020/index.html|   4 +-
 blog/airflow-survey-2022/index.html|   4 +-
 blog/airflow-survey/index.html |   4 +-
 blog/airflow-two-point-oh-is-here/index.html   |   4 +-
 blog/airflow_summit_2021/index.html|   4 +-
 blog/airflow_summit_2022/index.html|   4 +-
 blog/announcing-new-website/index.html |   4 +-
 blog/apache-airflow-for-newcomers/index.html   |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 blog/fab-oid-vulnerability/index.html  |   4 +-
 .../index.html |   4 +-
 blog/index.xml |   2 +-
 blog/introducing_setup_teardown/index.html |   4 +-
 .../index.html |   4 +-
 blog/tags/release/index.xml|   2 +-
 index.xml  |   2 +-
 search/index.html  |   4 +-
 sitemap.xml| 136 ++---
 use-cases/adobe/index.html |   4 +-
 use-cases/adyen/index.html |   4 +-
 use-cases/big-fish-games/index.html|   4 +-
 use-cases/business_operations/index.html   |   4 +-
 use-cases/dish/index.html  |   4 +-
 use-cases/etl_analytics/index.html |   4 +-
 use-cases/experity/index.html  |   4 +-
 use-cases/infrastructure-management/index.html |   4 +-
 use-cases/mlops/index.html |   4 +-
 use-cases/onefootball/index.html   |   4 +-
 use-cases/plarium-krasnodar/index.html |   4 +-
 use-cases/seniorlink/index.html|   4 +-
 use-cases/sift/index.html  |   4 +-
 use-cases/snapp/index.html |   4 +-
 use-cases/suse/index.html  |   4 +-
 48 files changed, 158 insertions(+), 158 deletions(-)



(airflow) branch constraints-main updated: Updating constraints. Github run id:8606550300

2024-04-08 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch constraints-main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/constraints-main by this push:
 new 2c0a5b33b8 Updating constraints. Github run id:8606550300
2c0a5b33b8 is described below

commit 2c0a5b33b8dc9120ffd2cd22074266ad85f2f157
Author: Automated GitHub Actions commit 
AuthorDate: Mon Apr 8 21:50:54 2024 +

Updating constraints. Github run id:8606550300

This update in constraints is automatically committed by the CI 
'constraints-push' step based on
'refs/heads/main' in the 'apache/airflow' repository with commit sha 
94153d70ac894d7c5249d183304646995d5df3e4.

The action that build those constraints can be found at 
https://github.com/apache/airflow/actions/runs/8606550300/

The image tag used for that build was: 
94153d70ac894d7c5249d183304646995d5df3e4. You can enter Breeze environment
with this image by running 'breeze shell --image-tag 
94153d70ac894d7c5249d183304646995d5df3e4'

All tests passed in this build so we determined we can push the updated 
constraints.

See 
https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for 
details.
---
 constraints-3.10.txt  | 10 +-
 constraints-3.11.txt  | 10 +-
 constraints-3.12.txt  |  8 
 constraints-3.8.txt   | 10 +-
 constraints-3.9.txt   | 10 +-
 constraints-no-providers-3.10.txt |  5 -
 constraints-no-providers-3.11.txt |  5 -
 constraints-no-providers-3.12.txt |  5 -
 constraints-no-providers-3.8.txt  |  5 -
 constraints-no-providers-3.9.txt  |  5 -
 constraints-source-providers-3.10.txt | 10 +-
 constraints-source-providers-3.11.txt | 10 +-
 constraints-source-providers-3.12.txt |  8 
 constraints-source-providers-3.8.txt  | 10 +-
 constraints-source-providers-3.9.txt  | 10 +-
 15 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/constraints-3.10.txt b/constraints-3.10.txt
index a8fb9bd639..0a69f53ef5 100644
--- a/constraints-3.10.txt
+++ b/constraints-3.10.txt
@@ -1,6 +1,6 @@
 
 #
-# This constraints file was automatically generated on 
2024-04-07T16:34:04.830710
+# This constraints file was automatically generated on 
2024-04-08T21:21:58.044404
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
@@ -182,7 +182,7 @@ apache-airflow-providers-vertica==3.7.1
 apache-airflow-providers-weaviate==1.3.3
 apache-airflow-providers-yandex==3.9.0
 apache-airflow-providers-zendesk==4.6.0
-apache-beam==2.55.0
+apache-beam==2.55.1
 apispec==6.6.0
 apprise==1.7.5
 argcomplete==3.2.3
@@ -291,7 +291,7 @@ eralchemy2==1.3.8
 et-xmlfile==1.1.0
 eventlet==0.36.1
 exceptiongroup==1.2.0
-execnet==2.1.0
+execnet==2.1.1
 executing==2.0.1
 facebook_business==19.0.2
 fastavro==1.9.4
@@ -316,7 +316,7 @@ google-api-python-client==2.125.0
 google-auth-httplib2==0.2.0
 google-auth-oauthlib==1.2.0
 google-auth==2.29.0
-google-cloud-aiplatform==1.46.0
+google-cloud-aiplatform==1.47.0
 google-cloud-appengine-logging==1.4.3
 google-cloud-audit-log==0.2.5
 google-cloud-automl==2.13.3
@@ -691,7 +691,7 @@ types-certifi==2021.10.8.3
 types-croniter==2.0.0.20240321
 types-docutils==0.20.0.20240406
 types-paramiko==3.4.0.20240311
-types-protobuf==4.24.0.20240311
+types-protobuf==4.24.0.20240408
 types-pyOpenSSL==24.0.0.20240311
 types-python-dateutil==2.9.0.20240316
 types-python-slugify==8.0.2.20240310
diff --git a/constraints-3.11.txt b/constraints-3.11.txt
index de5cb9bd4e..434d455fe5 100644
--- a/constraints-3.11.txt
+++ b/constraints-3.11.txt
@@ -1,6 +1,6 @@
 
 #
-# This constraints file was automatically generated on 
2024-04-07T16:34:04.961048
+# This constraints file was automatically generated on 
2024-04-08T21:21:58.301866
 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow.
 # This variant of constraints install uses the HEAD of the branch version for 
'apache-airflow' but installs
 # the providers from PIP-released packages at the moment of the constraint 
generation.
@@ -182,7 +182,7 @@ apache-airflow-providers-vertica==3.7.1
 apache-airflow-providers-weaviate==1.3.3
 apache-airflow-providers-yandex==3.9.0
 apache-airflow-providers-zendesk==4.6.0
-apache-beam==2.55.0
+apache-beam==2.55.1
 apispec==6.6.0
 apprise==1.7.5
 argcomplete==3.2.3
@@ -289,7 +289,7 @@ entrypoints==0.4
 eralchemy2==1.3.8
 et-xmlfile==1.1.0
 eventlet==0.36.1
-execnet==2.1.0
+execnet==2.1.1
 executing==2.0.1
 facebook_business==19.0.2
 fastavro==1.9.4
@@ -314,7 +314,7 @@ google-api-python-client==2.125.0
 

(airflow-site) branch main updated: Fix typo in 2.9 blog post (#1000)

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


The following commit(s) were added to refs/heads/main by this push:
 new 265ea2fea6 Fix typo in 2.9 blog post (#1000)
265ea2fea6 is described below

commit 265ea2fea662e28996fbbf8eb24b15a64c4773bb
Author: Jed Cunningham <66968678+jedcunning...@users.noreply.github.com>
AuthorDate: Mon Apr 8 17:34:56 2024 -0400

Fix typo in 2.9 blog post (#1000)
---
 .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes
 .../site/content/en/blog/airflow-2.9.0/index.md|   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 
b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
index 0b209264c3..80b1f063e8 100644
Binary files 
a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 and 
b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 differ
diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md 
b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
index b8217f5d5c..9a502cfc8f 100644
--- a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
+++ b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
@@ -154,7 +154,7 @@ def big_hello():
 greeting = ""
 for c in "Hello Airflow 2.9":
 greeting += c
-print(f"Adding {c} to out greeting. Current greeting: {greeting}")
+print(f"Adding {c} to our greeting. Current greeting: {greeting}")
 print("::endgroup::")
 print(greeting)
 ```



Re: [PR] Fix typo in 2.9 blog post [airflow-site]

2024-04-08 Thread via GitHub


jedcunningham merged PR #1000:
URL: https://github.com/apache/airflow-site/pull/1000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow-site) branch 2.9.0-blog-post-typo deleted (was d2db4716ad)

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a change to branch 2.9.0-blog-post-typo
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


 was d2db4716ad Fix typo in 2.9 blog post

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



Re: [PR] Make _run_task_by_local_task_job compatible with internal API [airflow]

2024-04-08 Thread via GitHub


dstandish commented on PR #38563:
URL: https://github.com/apache/airflow/pull/38563#issuecomment-2043674071

   ok renaming this one to `Add session that blows up when using internal 
API` because, with the addition of this, there's no longer a need to make the 
other changes I think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Introduce new config variable to control whether DAG processor outputs to stdout [airflow]

2024-04-08 Thread via GitHub


dimon222 commented on PR #37439:
URL: https://github.com/apache/airflow/pull/37439#issuecomment-2043662173

   How to select output ONLY to stdout? If I write to both locations I risk 
getting OOM in container environment and I don't want to persist it perpetually 
in PVC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow-site) branch gh-pages updated (12ec5916dd -> 613224189f)

2024-04-08 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


 discard 12ec5916dd Rewritten history to remove past gh-pages deployments
 new 613224189f Rewritten history to remove past gh-pages deployments

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (12ec5916dd)
\
 N -- N -- N   refs/heads/gh-pages (613224189f)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 blog/airflow-1.10.10/index.html|   4 +-
 blog/airflow-1.10.12/index.html|   4 +-
 blog/airflow-1.10.8-1.10.9/index.html  |   4 +-
 blog/airflow-2.2.0/index.html  |   4 +-
 blog/airflow-2.3.0/index.html  |   4 +-
 blog/airflow-2.4.0/index.html  |   4 +-
 blog/airflow-2.5.0/index.html  |   4 +-
 blog/airflow-2.6.0/index.html  |   4 +-
 blog/airflow-2.7.0/index.html  |   4 +-
 blog/airflow-2.8.0/index.html  |   4 +-
 blog/airflow-2.9.0/index.html  |   4 +-
 blog/airflow-survey-2020/index.html|   4 +-
 blog/airflow-survey-2022/index.html|   4 +-
 blog/airflow-survey/index.html |   4 +-
 blog/airflow-two-point-oh-is-here/index.html   |   4 +-
 blog/airflow_summit_2021/index.html|   4 +-
 blog/airflow_summit_2022/index.html|   4 +-
 blog/announcing-new-website/index.html |   4 +-
 blog/apache-airflow-for-newcomers/index.html   |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 .../index.html |   4 +-
 blog/fab-oid-vulnerability/index.html  |   4 +-
 .../index.html |   4 +-
 blog/introducing_setup_teardown/index.html |   4 +-
 .../index.html |   4 +-
 community/index.html   |  30 +
 search/index.html  |   4 +-
 sitemap.xml| 136 ++---
 use-cases/adobe/index.html |   4 +-
 use-cases/adyen/index.html |   4 +-
 use-cases/big-fish-games/index.html|   4 +-
 use-cases/business_operations/index.html   |   4 +-
 use-cases/dish/index.html  |   4 +-
 use-cases/etl_analytics/index.html |   4 +-
 use-cases/experity/index.html  |   4 +-
 use-cases/infrastructure-management/index.html |   4 +-
 use-cases/mlops/index.html |   4 +-
 use-cases/onefootball/index.html   |   4 +-
 use-cases/plarium-krasnodar/index.html |   4 +-
 use-cases/seniorlink/index.html|   4 +-
 use-cases/sift/index.html  |   4 +-
 use-cases/snapp/index.html |   4 +-
 use-cases/suse/index.html  |   4 +-
 45 files changed, 184 insertions(+), 154 deletions(-)



Re: [PR] fix: try002 for provider imap [airflow]

2024-04-08 Thread via GitHub


Taragolis commented on code in PR #38804:
URL: https://github.com/apache/airflow/pull/38804#discussion_r1556428173


##
airflow/providers/imap/hooks/imap.py:
##
@@ -219,7 +219,7 @@ def _retrieve_mails_attachments_by_name(
 self, name: str, check_regex: bool, latest_only: bool, mail_folder: 
str, mail_filter: str
 ) -> list:
 if not self.mail_client:
-raise Exception("The 'mail_client' should be initialized before!")
+raise AirflowException("The 'mail_client' should be initialized 
before!")

Review Comment:
   ```suggestion
   raise RuntimeError("The 'mail_client' should be initialized 
before!")
   ```
   
   I guess RuntimeError would be fine in this case, when we expects that some 
of the values are initialised before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated: fix: try002 for provider common sql (#38800)

2024-04-08 Thread taragolis
This is an automated email from the ASF dual-hosted git repository.

taragolis pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 94153d70ac fix: try002 for provider common sql (#38800)
94153d70ac is described below

commit 94153d70ac894d7c5249d183304646995d5df3e4
Author: Sebastian Daum 
AuthorDate: Mon Apr 8 23:00:12 2024 +0200

fix: try002 for provider common sql (#38800)
---
 airflow/providers/common/sql/hooks/sql.py | 10 +++---
 pyproject.toml|  2 --
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/airflow/providers/common/sql/hooks/sql.py 
b/airflow/providers/common/sql/hooks/sql.py
index 3f324e4f69..7f1536a39b 100644
--- a/airflow/providers/common/sql/hooks/sql.py
+++ b/airflow/providers/common/sql/hooks/sql.py
@@ -40,7 +40,11 @@ import sqlparse
 from more_itertools import chunked
 from sqlalchemy import create_engine
 
-from airflow.exceptions import AirflowException, 
AirflowProviderDeprecationWarning
+from airflow.exceptions import (
+AirflowException,
+AirflowOptionalProviderFeatureException,
+AirflowProviderDeprecationWarning,
+)
 from airflow.hooks.base import BaseHook
 
 if TYPE_CHECKING:
@@ -230,7 +234,7 @@ class DbApiHook(BaseHook):
 try:
 from pandas.io import sql as psql
 except ImportError:
-raise Exception(
+raise AirflowOptionalProviderFeatureException(
 "pandas library not installed, run: pip install "
 "'apache-airflow-providers-common-sql[pandas]'."
 )
@@ -257,7 +261,7 @@ class DbApiHook(BaseHook):
 try:
 from pandas.io import sql as psql
 except ImportError:
-raise Exception(
+raise AirflowOptionalProviderFeatureException(
 "pandas library not installed, run: pip install "
 "'apache-airflow-providers-common-sql[pandas]'."
 )
diff --git a/pyproject.toml b/pyproject.toml
index bac8b3126c..58d1b31c4b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -381,8 +381,6 @@ combine-as-imports = true
 # All the providers modules which do not follow TRY002 yet
 # cncf.kubernetes
 "airflow/providers/cncf/kubernetes/operators/pod.py" = ["TRY002"]
-# common.sql
-"airflow/providers/common/sql/hooks/sql.py" = ["TRY002"]
 # google
 "airflow/providers/google/cloud/hooks/bigquery.py" = ["TRY002"]
 "airflow/providers/google/cloud/hooks/dataflow.py" = ["TRY002"]



Re: [PR] fix: try002 for provider common sql [airflow]

2024-04-08 Thread via GitHub


Taragolis merged PR #38800:
URL: https://github.com/apache/airflow/pull/38800


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]

2024-04-08 Thread via GitHub


Taragolis commented on issue #38848:
URL: https://github.com/apache/airflow/issues/38848#issuecomment-2043607725

   ``render_template_as_native_obj=True,`` will force to use 
`jinja2.nativetypes.NativeEnvironment` which are enforced to convert some to 
python types, see example:
   
   ```python
   from jinja2.sandbox import SandboxedEnvironment
   from jinja2.nativetypes import NativeEnvironment
   
   environments = {"sandboxed": SandboxedEnvironment(), "native": 
NativeEnvironment()}
   
   template = "4.0"
   for name, env in environments.items():
   print(f" Jinja2 Environment: {name} ".center(72, "="))
   rendered = env.from_string("4.0").render()
   print(f"Template: {template!r}")
   print(f"Rendered: {rendered!r}, Type: {type(rendered).__name__!r}")
   ```
   ```console
    Jinja2 Environment: sandboxed =
   Template: '4.0'
   Rendered: '4.0', Type: 'str'
   == Jinja2 Environment: native ==
   Template: '4.0'
   Rendered: 4.0, Type: 'float'
   ```
   
   There is two options here.
   
   **Option 1**: 
   Do not use `jinja2.nativetypes.NativeEnvironment`, e.g. 
`render_template_as_native_obj=True`
   
   **Option 2** (Airflow 2.8+):
   Use `airflow.template.templater.LiteralValue` (see: 
https://github.com/apache/airflow/pull/35017) wrapper, which prevent template 
values, e.g.
   
   ```diff
   +from airflow.template.templater import LiteralValue
   
   ...
   
   create_job_kwargs={
   'WorkerType': 'G.2X',
   'NumberOfWorkers': '1',
   'DefaultArguments': {
   '--datalake-formats': 'iceberg',
   '--additional-python-modules': 'importlib',
   },
   -   'GlueVersion': '4.0'
   +   'GlueVersion': LiteralValue('4.0')
   },
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Fix typo in 2.9 blog post [airflow-site]

2024-04-08 Thread via GitHub


jedcunningham opened a new pull request, #1000:
URL: https://github.com/apache/airflow-site/pull/1000

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow-site) 01/01: Fix typo in 2.9 blog post

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a commit to branch 2.9.0-blog-post-typo
in repository https://gitbox.apache.org/repos/asf/airflow-site.git

commit d2db4716ad049c85cd63c460e2e57cddfb612a02
Author: Jed Cunningham 
AuthorDate: Mon Apr 8 15:25:15 2024 -0500

Fix typo in 2.9 blog post
---
 .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes
 .../site/content/en/blog/airflow-2.9.0/index.md|   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 
b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
index 0b209264c3..80b1f063e8 100644
Binary files 
a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 and 
b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png
 differ
diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md 
b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
index b8217f5d5c..9a502cfc8f 100644
--- a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
+++ b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md
@@ -154,7 +154,7 @@ def big_hello():
 greeting = ""
 for c in "Hello Airflow 2.9":
 greeting += c
-print(f"Adding {c} to out greeting. Current greeting: {greeting}")
+print(f"Adding {c} to our greeting. Current greeting: {greeting}")
 print("::endgroup::")
 print(greeting)
 ```



(airflow-site) branch 2.9.0-blog-post-typo created (now d2db4716ad)

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a change to branch 2.9.0-blog-post-typo
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


  at d2db4716ad Fix typo in 2.9 blog post

This branch includes the following new commits:

 new d2db4716ad Fix typo in 2.9 blog post

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




(airflow) branch main updated: Amazon Bedrock - Model Customization Jobs (#38693)

2024-04-08 Thread ferruzzi
This is an automated email from the ASF dual-hosted git repository.

ferruzzi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 7ed31d5fdf Amazon Bedrock - Model Customization Jobs (#38693)
7ed31d5fdf is described below

commit 7ed31d5fdf510e00528522ea313a20b19e498522
Author: D. Ferruzzi 
AuthorDate: Mon Apr 8 13:22:16 2024 -0700

Amazon Bedrock - Model Customization Jobs (#38693)

* Amazon Bedrock - Customize Model Operator/Sensor/Waiter/Trigger
---
 airflow/providers/amazon/aws/hooks/bedrock.py  |  20 +++
 airflow/providers/amazon/aws/operators/bedrock.py  | 161 -
 airflow/providers/amazon/aws/sensors/bedrock.py| 110 ++
 airflow/providers/amazon/aws/triggers/bedrock.py   |  61 
 airflow/providers/amazon/aws/waiters/bedrock.json  |  42 ++
 airflow/providers/amazon/provider.yaml |   6 +
 .../operators/bedrock.rst  |  38 +
 tests/providers/amazon/aws/hooks/test_bedrock.py   |  36 -
 .../providers/amazon/aws/operators/test_bedrock.py | 161 ++---
 tests/providers/amazon/aws/sensors/test_bedrock.py |  95 
 .../providers/amazon/aws/triggers/test_bedrock.py  |  53 +++
 tests/providers/amazon/aws/waiters/test_bedrock.py |  70 +
 .../system/providers/amazon/aws/example_bedrock.py | 106 +-
 13 files changed, 929 insertions(+), 30 deletions(-)

diff --git a/airflow/providers/amazon/aws/hooks/bedrock.py 
b/airflow/providers/amazon/aws/hooks/bedrock.py
index 11bacd9414..96636eb952 100644
--- a/airflow/providers/amazon/aws/hooks/bedrock.py
+++ b/airflow/providers/amazon/aws/hooks/bedrock.py
@@ -19,6 +19,26 @@ from __future__ import annotations
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
 
 
+class BedrockHook(AwsBaseHook):
+"""
+Interact with Amazon Bedrock.
+
+Provide thin wrapper around 
:external+boto3:py:class:`boto3.client("bedrock") `.
+
+Additional arguments (such as ``aws_conn_id``) may be specified and
+are passed down to the underlying AwsBaseHook.
+
+.. seealso::
+- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
+"""
+
+client_type = "bedrock"
+
+def __init__(self, *args, **kwargs) -> None:
+kwargs["client_type"] = self.client_type
+super().__init__(*args, **kwargs)
+
+
 class BedrockRuntimeHook(AwsBaseHook):
 """
 Interact with the Amazon Bedrock Runtime.
diff --git a/airflow/providers/amazon/aws/operators/bedrock.py 
b/airflow/providers/amazon/aws/operators/bedrock.py
index d8eaf9e5d3..ee34a9aef7 100644
--- a/airflow/providers/amazon/aws/operators/bedrock.py
+++ b/airflow/providers/amazon/aws/operators/bedrock.py
@@ -19,10 +19,17 @@ from __future__ import annotations
 import json
 from typing import TYPE_CHECKING, Any, Sequence
 
-from airflow.providers.amazon.aws.hooks.bedrock import BedrockRuntimeHook
+from botocore.exceptions import ClientError
+
+from airflow.configuration import conf
+from airflow.exceptions import AirflowException
+from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook, 
BedrockRuntimeHook
 from airflow.providers.amazon.aws.operators.base_aws import AwsBaseOperator
+from airflow.providers.amazon.aws.triggers.bedrock import 
BedrockCustomizeModelCompletedTrigger
+from airflow.providers.amazon.aws.utils import validate_execute_complete_event
 from airflow.providers.amazon.aws.utils.mixins import aws_template_fields
 from airflow.utils.helpers import prune_dict
+from airflow.utils.timezone import utcnow
 
 if TYPE_CHECKING:
 from airflow.utils.context import Context
@@ -91,3 +98,155 @@ class 
BedrockInvokeModelOperator(AwsBaseOperator[BedrockRuntimeHook]):
 self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data)
 self.log.info("Bedrock model response: %s", response_body)
 return response_body
+
+
+class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]):
+"""
+Create a fine-tuning job to customize a base model.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BedrockCustomizeModelOperator`
+
+:param job_name: A unique name for the fine-tuning job.
+:param custom_model_name: A name for the custom model being created.
+:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon 
Bedrock can assume
+to perform tasks on your behalf.
+:param base_model_id: Name of the base model.
+:param training_data_uri: The S3 URI where the training data is stored.
+:param output_data_uri: The S3 URI where the output data is stored.
+:param hyperparameters: Parameters related to tuning the model.
+:param ensure_unique_job_name: If set to true, operator will check whether 
a model customization
+job 

(airflow-site) branch add-weilee-to-committer-list deleted (was 43cd4e89d8)

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a change to branch add-weilee-to-committer-list
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


 was 43cd4e89d8 add Wei Lee to committer list

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



(airflow-site) branch main updated: add Wei Lee to committer list (#996)

2024-04-08 Thread jedcunningham
This is an automated email from the ASF dual-hosted git repository.

jedcunningham pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow-site.git


The following commit(s) were added to refs/heads/main by this push:
 new e1c24fbde7 add Wei Lee to committer list (#996)
e1c24fbde7 is described below

commit e1c24fbde7eaa8157a3baa207ca271fb9f82b2b7
Author: Wei Lee 
AuthorDate: Tue Apr 9 04:22:57 2024 +0800

add Wei Lee to committer list (#996)
---
 landing-pages/site/data/committers.json | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/landing-pages/site/data/committers.json 
b/landing-pages/site/data/committers.json
index 71e5b6a188..5242472022 100644
--- a/landing-pages/site/data/committers.json
+++ b/landing-pages/site/data/committers.json
@@ -161,6 +161,12 @@
 "image": "https://github.com/vincbeck.png;,
 "nick": "vincbeck"
   },
+  {
+"name": "Wei Lee",
+"github": "https://github.com/Lee-W;,
+"image": "https://github.com/Lee-W.png;,
+"nick": "Lee-W"
+  },
   {
 "name": "Xinbin Huang",
 "github": "https://github.com/xinbinhuang;,



Re: [PR] add Wei Lee to committer list [airflow-site]

2024-04-08 Thread via GitHub


jedcunningham merged PR #996:
URL: https://github.com/apache/airflow-site/pull/996


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi merged PR #38693:
URL: https://github.com/apache/airflow/pull/38693


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Fix eks.py SyntaxWarning: invalid esape sequence '\s' [airflow]

2024-04-08 Thread via GitHub


vincbeck commented on PR #38734:
URL: https://github.com/apache/airflow/pull/38734#issuecomment-2043541309

   Hi @dabla, it seems this PR introduced a bug. All our system tests related 
to AWS EKS started to fail for the same reason. See error below:
   
   ```
   ERROR [root] exec: failed to decode process output: Expecting value: line 1 
column 41 (char 40)
   INFO  [airflow.models.taskinstance] ::group::Post task execution logs
   ERROR [airflow.task] Task failed with exception
   Traceback (most recent call last):
 File "/opt/airflow/airflow/models/taskinstance.py", line 478, in 
_execute_task
   result = _execute_callable(context=context, **execute_callable_kwargs)
 File "/opt/airflow/airflow/models/taskinstance.py", line 441, in 
_execute_callable
   return ExecutionCallableRunner(
 File "/opt/airflow/airflow/utils/operator_helpers.py", line 250, in run
   return self.func(*args, **kwargs)
 File "/opt/airflow/airflow/models/baseoperator.py", line 405, in wrapper
   return func(self, *args, **kwargs)
 File "/opt/airflow/airflow/providers/amazon/aws/operators/eks.py", line 
1103, in execute
   return super().execute(context)
 File "/opt/airflow/airflow/models/baseoperator.py", line 405, in wrapper
   return func(self, *args, **kwargs)
 File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 578, in execute
   return self.execute_sync(context)
 File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 586, in execute_sync
   self.pod = self.get_or_create_pod(  # must set `self.pod` for `on_kill`
 File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 542, in get_or_create_pod
   pod = self.find_pod(self.namespace or 
pod_request_obj.metadata.namespace, context=context)
 File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", 
line 524, in find_pod
   pod_list = self.client.list_namespaced_pod(
 File 
"/usr/local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", 
line 15823, in list_namespaced_pod
   return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # 
noqa: E501
 File 
"/usr/local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", 
line 15942, in list_namespaced_pod_with_http_info
   return self.api_client.call_api(
 File 
"/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 
348, in call_api
   return self.__call_api(resource_path, method,
 File 
"/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 
180, in __call_api
   response_data = self.request(
 File 
"/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 
373, in request
   return self.rest_client.GET(url,
 File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 244, in GET
   return self.request("GET", url,
 File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", 
line 238, in request
   raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (403)
   Reason: Forbidden
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'dede46b8-8f1d-496f-b005-5d49cf69f3a8', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 
'X-Kubernetes-Pf-Flowschema-Uid': '70b99781-63dc-4f6a-8e55-475c91222bbc', 
'X-Kubernetes-Pf-Prioritylevel-Uid': '64a41238-df95-49a3-b050-f3a551eb9d19', 
'Date': 'Mon, 08 Apr 2024 18:12:50 GMT', 'Content-Length': '261'})
   HTTP response body: 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
 is forbidden: User \"system:anonymous\" cannot list resource \"pods\" in API 
group \"\" in the namespace 
\"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
   ```
   
   I am afraid the bug impacts not only AWS system tests but also users. I will 
create a PR to revert this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Automate links to configuration of providers [airflow]

2024-04-08 Thread via GitHub


poorvirohidekar commented on issue #34306:
URL: https://github.com/apache/airflow/issues/34306#issuecomment-2043511356

   @potiuk - From the file mentioned, I understand that class 
AuthConfigurations is rendering a template configurations.rst.jinja2 and when I 
look at the configurations.rst.jinja2 file 
(https://github.com/apache/airflow/blob/main/docs/exts/templates/configuration.rst.jinja2),
 there is a way in which a similar pattern of link seems to be getting 
generated? I am not sure if this AuthConfigurations is the relevant existing 
directive for this issue. 
   If not, then can you help me understand how the links currently being 
generated or point me in the right direction? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]

2024-04-08 Thread via GitHub


don1uppa opened a new issue, #38848:
URL: https://github.com/apache/airflow/issues/38848

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   Version: v2.8.1 current mwaa release
   
   ### What happened?
   
   [2024-04-08, 18:58:44 UTC] {{glue.py:189}} ERROR - Failed to run aws glue 
job, error: Parameter validation failed:
   Invalid type for parameter GlueVersion, value: 4.0, type: , 
valid types: 
   
   Here is the full stack trace:
   
   ip-10-0-20-19.ec2.internal
   *** Reading remote log from Cloudwatch log_group: airflow-test-mwaa-Task 
log_stream: 
dag_id=glue_operator_failure/run_id=manual__2024-04-08T18_56_04.206370+00_00/task_id=submit_glue_job/attempt=2.log.
   [2024-04-08, 18:58:43 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=non-requeueable deps ti=
   [2024-04-08, 18:58:43 UTC] {{taskinstance.py:1956}} INFO - Dependencies all 
met for dep_context=requeueable deps ti=
   [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2170}} INFO - Starting attempt 
2 of 2
   [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2191}} INFO - Executing 
 on 2024-04-08 18:56:04.206370+00:00
   [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:60}} INFO - Started 
process 18070 to run task
   [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:87}} INFO - Running: 
['airflow', 'tasks', 'run', 'glue_operator_failure', 'submit_glue_job', 
'manual__2024-04-08T18:56:04.206370+00:00', '--job-id', '5118', '--raw', 
'--subdir', 'DAGS_FOLDER/test_dags/glue_operator_failure.py', '--cfg-path', 
'/tmp/tmpbdmm1cz0']
   [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:88}} INFO - Job 5118: 
Subtask submit_glue_job
   [2024-04-08, 18:58:43 UTC] {{task_command.py:423}} INFO - Running 
 on host 
ip-10-0-20-19.ec2.internal
   [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2480}} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' 
AIRFLOW_CTX_DAG_ID='glue_operator_failure' 
AIRFLOW_CTX_TASK_ID='submit_glue_job' 
AIRFLOW_CTX_EXECUTION_DATE='2024-04-08T18:56:04.206370+00:00' 
AIRFLOW_CTX_TRY_NUMBER='2' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2024-04-08T18:56:04.206370+00:00'
   [2024-04-08, 18:58:43 UTC] {{glue.py:172}} INFO - Initializing AWS Glue Job: 
simple ls. Wait for completion: True
   [2024-04-08, 18:58:43 UTC] {{base.py:83}} INFO - Using connection ID 
'aws_default' for task execution.
   [2024-04-08, 18:58:43 UTC] {{glue.py:161}} INFO - Iam Role Name: 
test-glue-jobs
   [2024-04-08, 18:58:43 UTC] {{glue.py:356}} INFO - Checking if job already 
exists: simple ls
   [2024-04-08, 18:58:44 UTC] {{glue.py:421}} INFO - Creating job: simple ls
   [2024-04-08, 18:58:44 UTC] {{glue.py:189}} ERROR - Failed to run aws glue 
job, error: Parameter validation failed:
   Invalid type for parameter GlueVersion, value: 4.0, type: , 
valid types: 
   [2024-04-08, 18:58:44 UTC] {{taskinstance.py:2698}} ERROR - Task failed with 
exception
   Traceback (most recent call last):
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 433, in _execute_task
   result = execute_callable(context=context, **execute_callable_kwargs)

 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/glue.py",
 line 177, in execute
   glue_job_run = self.glue_job_hook.initialize_job(self.script_args, 
self.run_job_kwargs)
  

 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py",
 line 183, in initialize_job
   job_name = self.create_or_update_glue_job()
  
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py",
 line 422, in create_or_update_glue_job
   self.conn.create_job(**config)
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", 
line 553, in _api_call
   return self._make_api_call(operation_name, kwargs)
  ^^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", 
line 962, in _make_api_call
   request_dict = self._convert_to_request_dict(
  ^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", 
line 1036, in _convert_to_request_dict
   request_dict = self._serializer.serialize_to_request(
  ^^
 File 
"/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/validate.py", 
line 381, in serialize_to_request
   raise ParamValidationError(report=report.generate_report())
   

(airflow) branch main updated: Airflow 2.9.0 has been released (#38837)

2024-04-08 Thread ephraimanierobi
This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 14a2b2cf98 Airflow 2.9.0 has been released (#38837)
14a2b2cf98 is described below

commit 14a2b2cf98237658c500f3005c30c657c273d513
Author: Ephraim Anierobi 
AuthorDate: Mon Apr 8 20:01:31 2024 +0100

Airflow 2.9.0 has been released (#38837)
---
 .github/ISSUE_TEMPLATE/airflow_bug_report.yml  |   3 +-
 Dockerfile |   2 +-
 README.md  |  10 +-
 RELEASE_NOTES.rst  | 295 +
 airflow/reproducible_build.yaml|   4 +-
 .../installation/supported-versions.rst|   2 +-
 generated/PYPI_README.md   |   8 +-
 newsfragments/36376.significant.rst|  18 --
 newsfragments/36514.significant.rst|  13 -
 newsfragments/37005.significant.rst|  10 -
 newsfragments/37176.significant.rst|   1 -
 newsfragments/37627.significant.rst|   1 -
 newsfragments/37734.significant.rst|   3 -
 newsfragments/37915.significant.rst|   1 -
 newsfragments/37925.bugfix |   1 -
 newsfragments/37988.significant.rst|   1 -
 newsfragments/38025.significant.rst|  19 --
 newsfragments/38094.significant.rst|   5 -
 newsfragments/38401.significant.rst|   5 -
 scripts/ci/pre_commit/supported_versions.py|   2 +-
 20 files changed, 310 insertions(+), 94 deletions(-)

diff --git a/.github/ISSUE_TEMPLATE/airflow_bug_report.yml 
b/.github/ISSUE_TEMPLATE/airflow_bug_report.yml
index 0b9ed94944..47e3793f27 100644
--- a/.github/ISSUE_TEMPLATE/airflow_bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/airflow_bug_report.yml
@@ -25,8 +25,7 @@ body:
 the latest release or main to see if the issue is fixed before 
reporting it.
   multiple: false
   options:
-- "2.9.0b2"
-- "2.8.4"
+- "2.9.0"
 - "main (development)"
 - "Other Airflow 2 version (please specify below)"
 validations:
diff --git a/Dockerfile b/Dockerfile
index 292a378432..94798fbdb7 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -45,7 +45,7 @@ ARG AIRFLOW_UID="5"
 ARG AIRFLOW_USER_HOME_DIR=/home/airflow
 
 # latest released version here
-ARG AIRFLOW_VERSION="2.8.4"
+ARG AIRFLOW_VERSION="2.9.0"
 
 ARG PYTHON_BASE_IMAGE="python:3.8-slim-bookworm"
 
diff --git a/README.md b/README.md
index 1f618e51fa..107b0ae81d 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ Airflow is not a streaming solution, but it is often used to 
process real-time d
 
 Apache Airflow is tested with:
 
-| | Main version (dev) | Stable version (2.8.4)  |
+| | Main version (dev) | Stable version (2.9.0)  |
 |-||-|
 | Python  | 3.8, 3.9, 3.10, 3.11, 3.12 | 3.8, 3.9, 3.10, 3.11|
 | Platform| AMD64/ARM64(\*)| AMD64/ARM64(\*) |
@@ -180,15 +180,15 @@ them to the appropriate format and workflow that your 
tool requires.
 
 
 ```bash
-pip install 'apache-airflow==2.8.4' \
- --constraint 
"https://raw.githubusercontent.com/apache/airflow/constraints-2.8.4/constraints-3.8.txt;
+pip install 'apache-airflow==2.9.0' \
+ --constraint 
"https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.8.txt;
 ```
 
 2. Installing with extras (i.e., postgres, google)
 
 ```bash
 pip install 'apache-airflow[postgres,google]==2.8.3' \
- --constraint 
"https://raw.githubusercontent.com/apache/airflow/constraints-2.8.4/constraints-3.8.txt;
+ --constraint 
"https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.8.txt;
 ```
 
 For information on installing provider packages, check
@@ -293,7 +293,7 @@ Apache Airflow version life cycle:
 
 | Version   | Current Patch/Minor   | State | First Release   | Limited 
Support   | EOL/Terminated   |
 
|---|---|---|-|---|--|
-| 2 | 2.8.4 | Supported | Dec 17, 2020| TBD
   | TBD  |
+| 2 | 2.9.0 | Supported | Dec 17, 2020| TBD
   | TBD  |
 | 1.10  | 1.10.15   | EOL   | Aug 27, 2018| Dec 17, 
2020  | June 17, 2021|
 | 1.9   | 1.9.0 | EOL   | Jan 03, 2018| Aug 27, 
2018  | Aug 27, 2018 |
 | 1.8   | 1.8.2 | EOL   | Mar 19, 2017| Jan 03, 
2018  | Jan 03, 2018 |
diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst
index 

Re: [PR] Airflow 2.9.0 has been released [airflow]

2024-04-08 Thread via GitHub


ephraimbuddy merged PR #38837:
URL: https://github.com/apache/airflow/pull/38837


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated: Chart: Default Airflow version to 2.9.0 (#38839)

2024-04-08 Thread ephraimanierobi
This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 1d70bbf683 Chart: Default Airflow version to 2.9.0 (#38839)
1d70bbf683 is described below

commit 1d70bbf68336476605a47651da20f7ca6ceae542
Author: Ephraim Anierobi 
AuthorDate: Mon Apr 8 19:32:07 2024 +0100

Chart: Default Airflow version to 2.9.0 (#38839)

* Chart: Default Airflow version to 2.9.0

* fixup! Chart: Default Airflow version to 2.9.0
---
 chart/Chart.yaml  | 20 ++--
 chart/newsfragments/38478.significant.rst |  3 ---
 chart/newsfragments/38839.significant.rst |  3 +++
 chart/values.schema.json  |  4 ++--
 chart/values.yaml |  4 ++--
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/chart/Chart.yaml b/chart/Chart.yaml
index 659a36a628..29ee9ac5b7 100644
--- a/chart/Chart.yaml
+++ b/chart/Chart.yaml
@@ -20,7 +20,7 @@
 apiVersion: v2
 name: airflow
 version: 1.14.0
-appVersion: 2.8.4
+appVersion: 2.9.0
 description: The official Helm chart to deploy Apache Airflow, a platform to
   programmatically author, schedule, and monitor workflows
 home: https://airflow.apache.org/
@@ -47,23 +47,23 @@ annotations:
   url: https://airflow.apache.org/docs/helm-chart/1.14.0/
   artifacthub.io/screenshots: |
 - title: DAGs View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/dags.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/dags.png
 - title: Datasets View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/datasets.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/datasets.png
 - title: Grid View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/grid.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/grid.png
 - title: Graph View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/graph.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/graph.png
 - title: Calendar View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/calendar.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/calendar.png
 - title: Variable View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/variable_hidden.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/variable_hidden.png
 - title: Gantt Chart
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/gantt.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/gantt.png
 - title: Task Duration
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/duration.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/duration.png
 - title: Code View
-  url: 
https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/code.png
+  url: 
https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/code.png
   artifacthub.io/changes: |
 - description: Don't overwrite ``.Values.airflowPodAnnotations``
   kind: fixed
diff --git a/chart/newsfragments/38478.significant.rst 
b/chart/newsfragments/38478.significant.rst
deleted file mode 100644
index 9200834d00..00
--- a/chart/newsfragments/38478.significant.rst
+++ /dev/null
@@ -1,3 +0,0 @@
-Default Airflow image is updated to ``2.8.4``
-
-The default Airflow image that is used with the Chart is now ``2.8.4``, 
previously it was ``2.8.3``.
diff --git a/chart/newsfragments/38839.significant.rst 
b/chart/newsfragments/38839.significant.rst
new file mode 100644
index 00..8f913bf8bc
--- /dev/null
+++ b/chart/newsfragments/38839.significant.rst
@@ -0,0 +1,3 @@
+Default Airflow image is updated to ``2.9.0``
+
+The default Airflow image that is used with the Chart is now ``2.9.0``, 
previously it was ``2.8.3``.
diff --git a/chart/values.schema.json b/chart/values.schema.json
index 47b07c7d44..c4dbe47464 100644
--- a/chart/values.schema.json
+++ b/chart/values.schema.json
@@ -77,7 +77,7 @@
 "defaultAirflowTag": {
 "description": "Default airflow tag to deploy.",
 "type": "string",
-"default": "2.8.4",
+"default": "2.9.0",
 "x-docsSection": "Common"
 },
 "defaultAirflowDigest": {
@@ -92,7 +92,7 @@
 "airflowVersion": {
 "description": "Airflow version (Used to make some decisions based 
on Airflow Version being deployed).",
 "type": "string",
-"default": "2.8.4",
+"default": "2.9.0",
 "x-docsSection": "Common"
 },
 "securityContext": {
diff --git a/chart/values.yaml 

Re: [PR] Chart: Default Airflow version to 2.9.0 [airflow]

2024-04-08 Thread via GitHub


ephraimbuddy merged PR #38839:
URL: https://github.com/apache/airflow/pull/38839


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Change the UI color for ExternalSensor tasks [airflow]

2024-04-08 Thread via GitHub


idantepper commented on issue #38845:
URL: https://github.com/apache/airflow/issues/38845#issuecomment-2043379278

   Hi, I would like to take this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556213468


##
airflow/providers/amazon/aws/operators/bedrock.py:
##
@@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]:
 self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data)
 self.log.info("Bedrock model response: %s", response_body)
 return response_body
+
+
+class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]):
+"""
+Create a fine-tuning job to customize a base model.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BedrockCustomizeModelOperator`
+
+:param job_name: A unique name for the fine-tuning job.
+:param custom_model_name: A name for the custom model being created.
+:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon 
Bedrock can assume
+to perform tasks on your behalf.
+:param base_model_id: Name of the base model.
+:param training_data_uri: The S3 URI where the training data is stored.
+:param output_data_uri: The S3 URI where the output data is stored.
+:param hyperparameters: Parameters related to tuning the model.
+:param ensure_unique_job_name: If set to true, operator will check whether 
a model customization
+job already exists for the name in the config and append the current 
timestamp if there is a
+name conflict. (Default: True)
+:param customization_job_kwargs: Any optional parameters to pass to the 
API.
+
+:param wait_for_completion: Whether to wait for cluster to stop. (default: 
True)
+:param waiter_delay: Time in seconds to wait between status checks. 
(default: 120)
+:param waiter_max_attempts: Maximum number of attempts to check for job 
completion. (default: 75)
+:param deferrable: If True, the operator will wait asynchronously for the 
cluster to stop.
+This implies waiting for completion. This mode requires aiobotocore 
module to be installed.
+(default: False)
+:param aws_conn_id: The Airflow connection used for AWS credentials.
+If this is ``None`` or empty then the default boto3 behaviour is used. 
If
+running Airflow in a distributed manner and aws_conn_id is None or
+empty, then default boto3 configuration would be used (and must be
+maintained on each worker node).
+:param region_name: AWS region_name. If not specified then the default 
boto3 behaviour is used.
+:param verify: Whether or not to verify SSL certificates. See:
+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
+:param botocore_config: Configuration dictionary (key-values) for botocore 
client. See:
+
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
+"""
+
+aws_hook_class = BedrockHook
+template_fields: Sequence[str] = aws_template_fields(
+"job_name",
+"custom_model_name",
+"role_arn",
+"base_model_id",
+"hyperparameters",
+"ensure_unique_job_name",
+"customization_job_kwargs",
+)
+
+def __init__(
+self,
+job_name: str,
+custom_model_name: str,
+role_arn: str,
+base_model_id: str,
+training_data_uri: str,
+output_data_uri: str,
+hyperparameters: dict[str, str],
+ensure_unique_job_name: bool = True,
+customization_job_kwargs: dict[str, Any] | None = None,
+wait_for_completion: bool = True,
+waiter_delay: int = 120,
+waiter_max_attempts: int = 75,
+deferrable: bool = conf.getboolean("operators", "default_deferrable", 
fallback=False),
+**kwargs,
+):
+super().__init__(**kwargs)
+self.wait_for_completion = wait_for_completion
+self.waiter_delay = waiter_delay
+self.waiter_max_attempts = waiter_max_attempts
+self.deferrable = deferrable
+
+self.job_name = job_name
+self.custom_model_name = custom_model_name
+self.role_arn = role_arn
+self.base_model_id = base_model_id
+self.training_data_config = {"s3Uri": training_data_uri}
+self.output_data_config = {"s3Uri": output_data_uri}
+self.hyperparameters = hyperparameters
+self.ensure_unique_job_name = ensure_unique_job_name
+self.customization_job_kwargs = customization_job_kwargs or {}
+
+self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"}
+
+def execute_complete(self, context: Context, event: dict[str, Any] | None 
= None) -> str:
+event = validate_execute_complete_event(event)
+
+if event["status"] != "success":
+raise AirflowException(f"Error while running job: {event}")
+
+self.log.info("Bedrock model customization job `%s` complete.", 
self.job_name)
+ 

Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556206876


##
airflow/providers/amazon/aws/operators/bedrock.py:
##
@@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]:
 self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data)
 self.log.info("Bedrock model response: %s", response_body)
 return response_body
+
+
+class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]):
+"""
+Create a fine-tuning job to customize a base model.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BedrockCustomizeModelOperator`
+
+:param job_name: A unique name for the fine-tuning job.
+:param custom_model_name: A name for the custom model being created.
+:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon 
Bedrock can assume
+to perform tasks on your behalf.
+:param base_model_id: Name of the base model.
+:param training_data_uri: The S3 URI where the training data is stored.
+:param output_data_uri: The S3 URI where the output data is stored.
+:param hyperparameters: Parameters related to tuning the model.
+:param ensure_unique_job_name: If set to true, operator will check whether 
a model customization
+job already exists for the name in the config and append the current 
timestamp if there is a
+name conflict. (Default: True)
+:param customization_job_kwargs: Any optional parameters to pass to the 
API.
+
+:param wait_for_completion: Whether to wait for cluster to stop. (default: 
True)
+:param waiter_delay: Time in seconds to wait between status checks. 
(default: 120)
+:param waiter_max_attempts: Maximum number of attempts to check for job 
completion. (default: 75)
+:param deferrable: If True, the operator will wait asynchronously for the 
cluster to stop.
+This implies waiting for completion. This mode requires aiobotocore 
module to be installed.
+(default: False)
+:param aws_conn_id: The Airflow connection used for AWS credentials.
+If this is ``None`` or empty then the default boto3 behaviour is used. 
If
+running Airflow in a distributed manner and aws_conn_id is None or
+empty, then default boto3 configuration would be used (and must be
+maintained on each worker node).
+:param region_name: AWS region_name. If not specified then the default 
boto3 behaviour is used.
+:param verify: Whether or not to verify SSL certificates. See:
+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
+:param botocore_config: Configuration dictionary (key-values) for botocore 
client. See:
+
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
+"""
+
+aws_hook_class = BedrockHook
+template_fields: Sequence[str] = aws_template_fields(
+"job_name",
+"custom_model_name",
+"role_arn",
+"base_model_id",
+"hyperparameters",
+"ensure_unique_job_name",
+"customization_job_kwargs",
+)
+
+def __init__(
+self,
+job_name: str,
+custom_model_name: str,
+role_arn: str,
+base_model_id: str,
+training_data_uri: str,
+output_data_uri: str,
+hyperparameters: dict[str, str],
+ensure_unique_job_name: bool = True,
+customization_job_kwargs: dict[str, Any] | None = None,
+wait_for_completion: bool = True,
+waiter_delay: int = 120,
+waiter_max_attempts: int = 75,
+deferrable: bool = conf.getboolean("operators", "default_deferrable", 
fallback=False),
+**kwargs,
+):
+super().__init__(**kwargs)
+self.wait_for_completion = wait_for_completion
+self.waiter_delay = waiter_delay
+self.waiter_max_attempts = waiter_max_attempts
+self.deferrable = deferrable
+
+self.job_name = job_name
+self.custom_model_name = custom_model_name
+self.role_arn = role_arn
+self.base_model_id = base_model_id
+self.training_data_config = {"s3Uri": training_data_uri}
+self.output_data_config = {"s3Uri": output_data_uri}
+self.hyperparameters = hyperparameters
+self.ensure_unique_job_name = ensure_unique_job_name
+self.customization_job_kwargs = customization_job_kwargs or {}
+
+self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"}
+
+def execute_complete(self, context: Context, event: dict[str, Any] | None 
= None) -> str:
+event = validate_execute_complete_event(event)
+
+if event["status"] != "success":
+raise AirflowException(f"Error while running job: {event}")
+
+self.log.info("Bedrock model customization job `%s` complete.", 
self.job_name)
+ 

Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556206876


##
airflow/providers/amazon/aws/operators/bedrock.py:
##
@@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]:
 self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data)
 self.log.info("Bedrock model response: %s", response_body)
 return response_body
+
+
+class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]):
+"""
+Create a fine-tuning job to customize a base model.
+
+.. seealso::
+For more information on how to use this operator, take a look at the 
guide:
+:ref:`howto/operator:BedrockCustomizeModelOperator`
+
+:param job_name: A unique name for the fine-tuning job.
+:param custom_model_name: A name for the custom model being created.
+:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon 
Bedrock can assume
+to perform tasks on your behalf.
+:param base_model_id: Name of the base model.
+:param training_data_uri: The S3 URI where the training data is stored.
+:param output_data_uri: The S3 URI where the output data is stored.
+:param hyperparameters: Parameters related to tuning the model.
+:param ensure_unique_job_name: If set to true, operator will check whether 
a model customization
+job already exists for the name in the config and append the current 
timestamp if there is a
+name conflict. (Default: True)
+:param customization_job_kwargs: Any optional parameters to pass to the 
API.
+
+:param wait_for_completion: Whether to wait for cluster to stop. (default: 
True)
+:param waiter_delay: Time in seconds to wait between status checks. 
(default: 120)
+:param waiter_max_attempts: Maximum number of attempts to check for job 
completion. (default: 75)
+:param deferrable: If True, the operator will wait asynchronously for the 
cluster to stop.
+This implies waiting for completion. This mode requires aiobotocore 
module to be installed.
+(default: False)
+:param aws_conn_id: The Airflow connection used for AWS credentials.
+If this is ``None`` or empty then the default boto3 behaviour is used. 
If
+running Airflow in a distributed manner and aws_conn_id is None or
+empty, then default boto3 configuration would be used (and must be
+maintained on each worker node).
+:param region_name: AWS region_name. If not specified then the default 
boto3 behaviour is used.
+:param verify: Whether or not to verify SSL certificates. See:
+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
+:param botocore_config: Configuration dictionary (key-values) for botocore 
client. See:
+
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
+"""
+
+aws_hook_class = BedrockHook
+template_fields: Sequence[str] = aws_template_fields(
+"job_name",
+"custom_model_name",
+"role_arn",
+"base_model_id",
+"hyperparameters",
+"ensure_unique_job_name",
+"customization_job_kwargs",
+)
+
+def __init__(
+self,
+job_name: str,
+custom_model_name: str,
+role_arn: str,
+base_model_id: str,
+training_data_uri: str,
+output_data_uri: str,
+hyperparameters: dict[str, str],
+ensure_unique_job_name: bool = True,
+customization_job_kwargs: dict[str, Any] | None = None,
+wait_for_completion: bool = True,
+waiter_delay: int = 120,
+waiter_max_attempts: int = 75,
+deferrable: bool = conf.getboolean("operators", "default_deferrable", 
fallback=False),
+**kwargs,
+):
+super().__init__(**kwargs)
+self.wait_for_completion = wait_for_completion
+self.waiter_delay = waiter_delay
+self.waiter_max_attempts = waiter_max_attempts
+self.deferrable = deferrable
+
+self.job_name = job_name
+self.custom_model_name = custom_model_name
+self.role_arn = role_arn
+self.base_model_id = base_model_id
+self.training_data_config = {"s3Uri": training_data_uri}
+self.output_data_config = {"s3Uri": output_data_uri}
+self.hyperparameters = hyperparameters
+self.ensure_unique_job_name = ensure_unique_job_name
+self.customization_job_kwargs = customization_job_kwargs or {}
+
+self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"}
+
+def execute_complete(self, context: Context, event: dict[str, Any] | None 
= None) -> str:
+event = validate_execute_complete_event(event)
+
+if event["status"] != "success":
+raise AirflowException(f"Error while running job: {event}")
+
+self.log.info("Bedrock model customization job `%s` complete.", 
self.job_name)
+ 

Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556199352


##
airflow/providers/amazon/aws/sensors/bedrock.py:
##
@@ -0,0 +1,111 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Sequence
+
+from airflow.configuration import conf
+from airflow.providers.amazon.aws.sensors.base_aws import AwsBaseSensor
+from airflow.providers.amazon.aws.triggers.bedrock import 
BedrockCustomizeModelCompletedTrigger
+from airflow.providers.amazon.aws.utils.mixins import aws_template_fields
+
+if TYPE_CHECKING:
+from airflow.utils.context import Context
+
+from airflow.exceptions import AirflowException, AirflowSkipException
+from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook
+
+
+class BedrockCustomizeModelCompletedSensor(AwsBaseSensor[BedrockHook]):
+"""
+Poll the state of the model customization job until it reaches a terminal 
state; fails if the job fails.
+
+.. seealso::
+For more information on how to use this sensor, take a look at the 
guide:
+:ref:`howto/sensor:BedrockCustomizeModelCompletedSensor`
+
+
+:param job_name: The name of the Bedrock model customization job.
+
+:param deferrable: If True, the sensor will operate in deferrable mode. 
This mode requires aiobotocore
+module to be installed.
+(default: False, but can be overridden in config file by setting 
default_deferrable to True)
+:param max_retries: Number of times before returning the current state. 
(default: 75)
+:param poke_interval: Polling period in seconds to check for the status of 
the job. (default: 120)
+:param aws_conn_id: The Airflow connection used for AWS credentials.
+If this is ``None`` or empty then the default boto3 behaviour is used. 
If
+running Airflow in a distributed manner and aws_conn_id is None or
+empty, then default boto3 configuration would be used (and must be
+maintained on each worker node).
+:param region_name: AWS region_name. If not specified then the default 
boto3 behaviour is used.
+:param verify: Whether or not to verify SSL certificates. See:
+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
+:param botocore_config: Configuration dictionary (key-values) for botocore 
client. See:
+
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
+"""
+
+INTERMEDIATE_STATES = ("InProgress",)
+FAILURE_STATES = ("Failed", "Stopping", "Stopped")
+SUCCESS_STATES = ("Completed",)
+FAILURE_MESSAGE = "Bedrock model customization job sensor failed."
+
+aws_hook_class = BedrockHook
+template_fields: Sequence[str] = aws_template_fields("job_name")
+ui_color = "#66c3ff"
+
+def __init__(
+self,
+*,
+job_name: str,
+max_retries: int = 75,
+poke_interval: int = 120,
+deferrable: bool = conf.getboolean("operators", "default_deferrable", 
fallback=False),
+**kwargs: Any,
+) -> None:
+super().__init__(**kwargs)
+self.job_name = job_name
+self.poke_interval = poke_interval
+self.max_retries = max_retries
+self.deferrable = deferrable
+
+def execute(self, context: Context) -> Any:
+if self.deferrable:
+self.defer(
+trigger=BedrockCustomizeModelCompletedTrigger(
+job_name=self.job_name,
+waiter_delay=int(self.poke_interval),
+waiter_max_attempts=self.max_retries,
+aws_conn_id=self.aws_conn_id,
+),
+method_name="poke",
+)
+else:
+super().execute(context=context)
+
+def poke(self, context: Context) -> bool:
+state = self.hook.get_customize_model_job_state(self.job_name)
+
+if state in self.FAILURE_STATES:
+# TODO: remove this if block when min_airflow_version is set to 
higher than 2.7.1
+if self.soft_fail:
+raise AirflowSkipException(self.FAILURE_MESSAGE)
+raise 

Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


ferruzzi commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556195811


##
airflow/providers/amazon/aws/hooks/bedrock.py:
##
@@ -19,6 +19,37 @@
 from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook
 
 
+class BedrockHook(AwsBaseHook):
+"""
+Interact with Amazon Bedrock.
+
+Provide thin wrapper around 
:external+boto3:py:class:`boto3.client("bedrock") `.
+
+Additional arguments (such as ``aws_conn_id``) may be specified and
+are passed down to the underlying AwsBaseHook.
+
+.. seealso::
+- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
+"""
+
+client_type = "bedrock"
+
+def __init__(self, *args, **kwargs) -> None:
+kwargs["client_type"] = self.client_type
+super().__init__(*args, **kwargs)
+
+def _get_job_by_name(self, job_name: str):
+return self.conn.get_model_customization_job(jobIdentifier=job_name)
+
+def get_customize_model_job_state(self, job_name: str) -> str:
+state = self._get_job_by_name(job_name)["status"]
+self.log.info("Job '%s' state: %s", job_name, state)
+return state
+
+def get_job_arn(self, job_name: str) -> str:
+return self._get_job_by_name(job_name)["jobArn"]

Review Comment:
   Yeah, they felt handy and made the Sensor unit tests a little easier to 
mock, but I'll drop them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Make _run_task_by_local_task_job compatible with internal API [airflow]

2024-04-08 Thread via GitHub


dstandish commented on code in PR #38563:
URL: https://github.com/apache/airflow/pull/38563#discussion_r1556179515


##
airflow/cli/commands/task_command.py:
##
@@ -278,7 +279,15 @@ def _run_task_by_local_task_job(args, ti: TaskInstance | 
TaskInstancePydantic) -
 external_executor_id=_extract_external_executor_id(args),
 )
 try:
-ret = run_job(job=job_runner.job, execute_callable=job_runner._execute)
+# If internal API is used, we must pass session=None so that one is 
not created
+# by the provide_session decorator.  All downstream functions that 
actually use
+# a session are already set up for RPC
+# todo: perhaps instead we should simply modify the provide_session 
decorator
+# to *not* provide a session when using internal API

Review Comment:
   i'm working on something like this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Setting use_beeline by default for hive cli connection [airflow]

2024-04-08 Thread via GitHub


eladkal commented on code in PR #38763:
URL: https://github.com/apache/airflow/pull/38763#discussion_r1556176719


##
airflow/providers/apache/hive/CHANGELOG.rst:
##
@@ -27,6 +27,19 @@
 Changelog
 -
 
+8.0.0
+.
+
+
+Breaking changes
+
+
+Changed the default value of ``use_beeline`` in hive cli connection to True.
+Beeline will be always enabled by default in this connection type from now 
onwards.
+
+* ``Setting use_beeline by default for hive cli connection (#38763)``

Review Comment:
   This will be set by  generating docs automaticly
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated: Add lineage_job_namespace and lineage_job_name OpenLineage macros (#38829)

2024-04-08 Thread mobuchowski
This is an automated email from the ASF dual-hosted git repository.

mobuchowski pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 093ab7e755 Add lineage_job_namespace and lineage_job_name OpenLineage 
macros (#38829)
093ab7e755 is described below

commit 093ab7e7556bad9202e83e9fd6d968c50a5f7cb8
Author: Maxim Martynov 
AuthorDate: Mon Apr 8 20:17:14 2024 +0300

Add lineage_job_namespace and lineage_job_name OpenLineage macros (#38829)
---
 airflow/providers/openlineage/plugins/macros.py| 43 
 .../providers/openlineage/plugins/openlineage.py   |  9 +++-
 airflow/providers/openlineage/utils/utils.py   |  2 +-
 .../macros.rst | 59 ++
 tests/providers/openlineage/plugins/test_macros.py | 36 ++---
 5 files changed, 109 insertions(+), 40 deletions(-)

diff --git a/airflow/providers/openlineage/plugins/macros.py 
b/airflow/providers/openlineage/plugins/macros.py
index 391b29495f..ddfceb3459 100644
--- a/airflow/providers/openlineage/plugins/macros.py
+++ b/airflow/providers/openlineage/plugins/macros.py
@@ -26,22 +26,41 @@ if TYPE_CHECKING:
 from airflow.models import TaskInstance
 
 
+def lineage_job_namespace():
+"""
+Macro function which returns Airflow OpenLineage namespace.
+
+.. seealso::
+For more information take a look at the guide:
+:ref:`howto/macros:openlineage`
+"""
+return conf.namespace()
+
+
+def lineage_job_name(task_instance: TaskInstance):
+"""
+Macro function which returns Airflow task name in OpenLineage format 
(`.`).
+
+.. seealso::
+For more information take a look at the guide:
+:ref:`howto/macros:openlineage`
+"""
+return get_job_name(task_instance)
+
+
 def lineage_run_id(task_instance: TaskInstance):
 """
-Macro function which returns the generated run id for a given task.
+Macro function which returns the generated run id (UUID) for a given task.
 
 This can be used to forward the run id from a task to a child run so the 
job hierarchy is preserved.
 
 .. seealso::
-For more information on how to use this operator, take a look at the 
guide:
+For more information take a look at the guide:
 :ref:`howto/macros:openlineage`
 """
-if TYPE_CHECKING:
-assert task_instance.task
-
 return OpenLineageAdapter.build_task_instance_run_id(
 dag_id=task_instance.dag_id,
-task_id=task_instance.task.task_id,
+task_id=task_instance.task_id,
 execution_date=task_instance.execution_date,
 try_number=task_instance.try_number,
 )
@@ -56,9 +75,13 @@ def lineage_parent_id(task_instance: TaskInstance):
 run so the job hierarchy is preserved. Child run can easily create 
ParentRunFacet from these information.
 
 .. seealso::
-For more information on how to use this macro, take a look at the 
guide:
+For more information take a look at the guide:
 :ref:`howto/macros:openlineage`
 """
-job_name = get_job_name(task_instance.task)
-run_id = lineage_run_id(task_instance)
-return f"{conf.namespace()}/{job_name}/{run_id}"
+return "/".join(
+(
+lineage_job_namespace(),
+lineage_job_name(task_instance),
+lineage_run_id(task_instance),
+)
+)
diff --git a/airflow/providers/openlineage/plugins/openlineage.py 
b/airflow/providers/openlineage/plugins/openlineage.py
index a0be47a499..5927929588 100644
--- a/airflow/providers/openlineage/plugins/openlineage.py
+++ b/airflow/providers/openlineage/plugins/openlineage.py
@@ -19,7 +19,12 @@ from __future__ import annotations
 from airflow.plugins_manager import AirflowPlugin
 from airflow.providers.openlineage import conf
 from airflow.providers.openlineage.plugins.listener import 
get_openlineage_listener
-from airflow.providers.openlineage.plugins.macros import lineage_parent_id, 
lineage_run_id
+from airflow.providers.openlineage.plugins.macros import (
+lineage_job_name,
+lineage_job_namespace,
+lineage_parent_id,
+lineage_run_id,
+)
 
 
 class OpenLineageProviderPlugin(AirflowPlugin):
@@ -32,5 +37,5 @@ class OpenLineageProviderPlugin(AirflowPlugin):
 
 name = "OpenLineageProviderPlugin"
 if not conf.is_disabled():
-macros = [lineage_run_id, lineage_parent_id]
+macros = [lineage_job_namespace, lineage_job_name, lineage_run_id, 
lineage_parent_id]
 listeners = [get_openlineage_listener()]
diff --git a/airflow/providers/openlineage/utils/utils.py 
b/airflow/providers/openlineage/utils/utils.py
index fb2263b90d..1c777aff76 100644
--- a/airflow/providers/openlineage/utils/utils.py
+++ b/airflow/providers/openlineage/utils/utils.py
@@ -56,7 +56,7 @@ def get_operator_class(task: BaseOperator) -> type:
 return task.__class__
 
 
-def 

Re: [PR] Add lineage_job_namespace and lineage_job_name OpenLineage macros [airflow]

2024-04-08 Thread via GitHub


mobuchowski merged PR #38829:
URL: https://github.com/apache/airflow/pull/38829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Setting use_beeline by default for hive cli connection [airflow]

2024-04-08 Thread via GitHub


potiuk commented on PR #38763:
URL: https://github.com/apache/airflow/pull/38763#issuecomment-2043244944

   Approved with NIT's


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Setting use_beeline by default for hive cli connection [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38763:
URL: https://github.com/apache/airflow/pull/38763#discussion_r1556155471


##
airflow/providers/apache/hive/provider.yaml:
##
@@ -25,6 +25,7 @@ state: ready
 source-date-epoch: 1709554960
 # note that those versions are maintained by release manager - do not update 
them manually
 versions:
+  - 7.0.2

Review Comment:
   And spelling issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Setting use_beeline by default for hive cli connection [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38763:
URL: https://github.com/apache/airflow/pull/38763#discussion_r1556154848


##
airflow/providers/apache/hive/provider.yaml:
##
@@ -25,6 +25,7 @@ state: ready
 source-date-epoch: 1709554960
 # note that those versions are maintained by release manager - do not update 
them manually
 versions:
+  - 7.0.2

Review Comment:
   ```suggestion
 - 8.0.0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated: Improve editable airflow installation by adding preinstalled deps (#38764)

2024-04-08 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new d409b8be96 Improve editable airflow installation by adding 
preinstalled deps (#38764)
d409b8be96 is described below

commit d409b8be966b51207e10129e7709e943a5dc7ac3
Author: Jarek Potiuk 
AuthorDate: Mon Apr 8 18:56:17 2024 +0200

Improve editable airflow installation by adding preinstalled deps (#38764)

* Improve editable airflow installation by adding preinstalled deps

While preparing for presentation about modernizing packaging setup
I found out that we can slightly improve editable installation of
airflow. So far we required a "devel" installation in order to make
a working editable installation of airflow (because the devel extra
was installing dependencies for all the pre-installed providers).

However that required to synchronize the list of providers installed
in `devel` dependency with the list of preinstalled providers, as
well as it made `pip install -e .` resulting with a non-working
version of Airflow (because it requires `fab` provider dependencies
to start airflow webserver).

This PR improves it - in editable mode, instead of adding the
pre-installed providers, we add their dependencies. This way we can
remove the pre-installed providers from the list of devel providers
to install - because now the pre-installed provider dependencies
are installed simply as "required" dependencies.

As a result - simple `pip install -e .` should now result in fully
working airflow installation - without all the devel goodies and
without celery and kubernetes dependencies, but fully usable for
sequential and local executor cases.

Also reviewed and updated the comments in hatch_build.py to better
reflect the purpose and behaviour of some of the methods there.

* Update hatch_build.py

Co-authored-by: Ephraim Anierobi 

-

Co-authored-by: Ephraim Anierobi 
---
 airflow_pre_installed_providers.txt |   9 ---
 hatch_build.py  | 150 ++--
 2 files changed, 110 insertions(+), 49 deletions(-)

diff --git a/airflow_pre_installed_providers.txt 
b/airflow_pre_installed_providers.txt
deleted file mode 100644
index e717ea696e..00
--- a/airflow_pre_installed_providers.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-# List of all the providers that are pre-installed when you run `pip install 
apache-airflow` without extras
-common.io
-common.sql
-fab>=1.0.2rc1
-ftp
-http
-imap
-smtp
-sqlite
diff --git a/hatch_build.py b/hatch_build.py
index 69b6da7a1c..532ee7d3e3 100644
--- a/hatch_build.py
+++ b/hatch_build.py
@@ -273,8 +273,6 @@ DEVEL_EXTRAS: dict[str, list[str]] = {
 "devel": [
 "apache-airflow[celery]",
 "apache-airflow[cncf-kubernetes]",
-"apache-airflow[common-io]",
-"apache-airflow[common-sql]",
 "apache-airflow[devel-debuggers]",
 "apache-airflow[devel-devscripts]",
 "apache-airflow[devel-duckdb]",
@@ -282,11 +280,6 @@ DEVEL_EXTRAS: dict[str, list[str]] = {
 "apache-airflow[devel-sentry]",
 "apache-airflow[devel-static-checks]",
 "apache-airflow[devel-tests]",
-"apache-airflow[fab]",
-"apache-airflow[ftp]",
-"apache-airflow[http]",
-"apache-airflow[imap]",
-"apache-airflow[sqlite]",
 ],
 "devel-all-dbs": [
 "apache-airflow[apache-cassandra]",
@@ -550,11 +543,35 @@ ALL_DYNAMIC_EXTRAS: list[str] = sorted(
 
 
 def get_provider_id(provider_spec: str) -> str:
-# in case provider_spec is "="
-return provider_spec.split(">=")[0]
+"""
+Extract provider id from provider specification.
+
+:param provider_spec: provider specification can be in the form of the 
"PROVIDER_ID" or
+   "apache-airflow-providers-PROVIDER", optionally followed by 
">=VERSION".
+
+:return: short provider_id with `.` instead of `-` in case of `apache` and 
other providers with
+ `-` in the name.
+"""
+_provider_id = provider_spec.split(">=")[0]
+if _provider_id.startswith("apache-airflow-providers-"):
+_provider_id = _provider_id.replace("apache-airflow-providers-", 
"").replace("-", ".")
+return _provider_id
 
 
 def get_provider_requirement(provider_spec: str) -> str:
+"""
+Convert provider specification with provider_id to provider requirement.
+
+The requirement can be used when constructing dependencies. It 
automatically adds pre-release specifier
+in case we are building pre-release version of Airflow. This way we can 
handle the case when airflow
+depends on specific version of the provider that has not yet been released 
- then we release the
+pre-release version of 

Re: [PR] Improve editable airflow installation by adding preinstalled deps [airflow]

2024-04-08 Thread via GitHub


potiuk merged PR #38764:
URL: https://github.com/apache/airflow/pull/38764


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Give `on_task_instance_failed` access to the error that caused the failure [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38155:
URL: https://github.com/apache/airflow/pull/38155#discussion_r1556148549


##
airflow/models/taskinstance.py:
##
@@ -1409,7 +1410,9 @@ class TaskInstance(Base, LoggingMixin):
 cascade="all, delete, delete-orphan",
 )
 note = association_proxy("task_instance_note", "content", 
creator=_creator_note)
+
 task: Operator | None = None
+_thread_local_data = threading.local()

Review Comment:
   I just look at this more closely. Hmm. I don't think we shouldadd the thread 
local data to task instance - why would we need to do it? The whole point of 
thread local data - is that you can access is directly whithout setting a 
variable to the model object that is passed down the stack.
   
   What I **really** thought about is literally two methods in 
`taskinstance.py`:
   
   ```python
   def set_last_error(error: None | str | BaseException:):
   ```
   
   ```python
   def get_last_error() -> None | str | BaseException:
   ```
   
   Both using ThreadLocal. No passing extra field in TaskInstance. 



##
airflow/models/taskinstance.py:
##
@@ -1409,7 +1410,9 @@ class TaskInstance(Base, LoggingMixin):
 cascade="all, delete, delete-orphan",
 )
 note = association_proxy("task_instance_note", "content", 
creator=_creator_note)
+
 task: Operator | None = None
+_thread_local_data = threading.local()

Review Comment:
   I just look at this more closely. Hmm. I don't think we should add the 
thread local data to task instance - why would we need to do it? The whole 
point of thread local data - is that you can access is directly whithout 
setting a variable to the model object that is passed down the stack.
   
   What I **really** thought about is literally two methods in 
`taskinstance.py`:
   
   ```python
   def set_last_error(error: None | str | BaseException:):
   ```
   
   ```python
   def get_last_error() -> None | str | BaseException:
   ```
   
   Both using ThreadLocal. No passing extra field in TaskInstance. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Databricks provider does not support Azure managed identity if more than one identity exists (e.g. ADF, AKS) [airflow]

2024-04-08 Thread via GitHub


potiuk commented on issue #38762:
URL: https://github.com/apache/airflow/issues/38762#issuecomment-2043197124

   > Perhaps one of the project maintainers could kindly suggest it if they 
have open channels of communication with Microsoft about their commercial 
offerings?
   
   Not as far as I know. But we would love Microsoft team to contribute in this 
and other relevant features. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Replace dill package to use cloudpickle [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38531:
URL: https://github.com/apache/airflow/pull/38531#discussion_r1556118149


##
airflow/models/taskinstance.py:
##
@@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin):
 queued_dttm = Column(UtcDateTime)
 queued_by_job_id = Column(Integer)
 pid = Column(Integer)
-executor_config = Column(ExecutorConfigType(pickler=dill))
+executor_config = Column(ExecutorConfigType(pickler=cloudpickle))

Review Comment:
   I believe - we cannot use pickle. There are certain executors (K8S executor 
is the main culprit) that can use configuration that is not picklable because 
it can use V1Pod and similar classes for custom pod template. 
   
   See the example here where you provide custom Pod defintion via 
executor-config passed to the task. Such config is serialized when the DAG is 
parsed and stored in the DB, and deserialized by the executor when it attempts 
to execute it:
   
   
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html#pod-override
   
   It's likely however that we could swap the default for `cloudpickle` (or 
maybe even choose whichever pickling class is available and importable - we 
could detect it, but then there is an issue for conversion for past task in 
case of changeover - as @hussein-awala mentioned and implemented for encrypted 
kwargs: https://github.com/apache/airflow/pull/38358
   
   I still think it's possibe to have the libraries as optional for core (and 
pre-install both in airflow reference image), but It needs a bit of 
deliberation and right errors/messaging when there are already task in the DB 
which have been serialized using one of them, but when the library is missing - 
basically handling the case where someone had tasks created in the DB using 
`dill` and starts airflow without `dill` - we should raise an error and tell 
the user to install `dill` to make the on-the-flight conversion of old values). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Databricks provider does not support Azure managed identity if more than one identity exists (e.g. ADF, AKS) [airflow]

2024-04-08 Thread via GitHub


jtv8 commented on issue #38762:
URL: https://github.com/apache/airflow/issues/38762#issuecomment-2043178028

   Hi @ghjklw! I agree that it would be better to leverage the abstractions in 
the official Microsoft libraries - the current implementation is far too low 
level. However, that would require a substantial rewrite. I certainly don't 
have the time to do that, and to be honest even setting up a development 
environment and unit tests for the quick fix I've proposed is a bit beyond my 
comfort zone given my unfamiliarity with Airflow and its architecture.
   
   In an ideal world, I'd like to think Microsoft would contribute this work 
themselves, given that they now provide both Airflow and Databricks 
commercially as managed services, and one would expect them to work together 
via managed identities out of the box.
   
   Perhaps one of the project maintainers could kindly suggest it if they have 
open channels of communication with Microsoft about their commercial offerings?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated (285f037dbc -> 76477b0977)

2024-04-08 Thread ephraimanierobi
This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from 285f037dbc Remove decorator from rendering fields example (#38827)
 add 76477b0977 Update version added field in config.yml (#38840)

No new revisions were added by this update.

Summary of changes:
 airflow/config_templates/config.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PR] Implement start/end/debug for multiple executors in scheduler job [airflow]

2024-04-08 Thread via GitHub


o-nikolas commented on code in PR #38514:
URL: https://github.com/apache/airflow/pull/38514#discussion_r1556103085


##
airflow/jobs/job.py:
##
@@ -104,12 +106,13 @@ class Job(Base, LoggingMixin):
 Only makes sense for SchedulerJob and BackfillJob instances.
 """
 
-def __init__(self, executor=None, heartrate=None, **kwargs):
+def __init__(self, executor: BaseExecutor | None = None, heartrate=None, 
**kwargs):
 # Save init parameters as DB fields
 self.heartbeat_failed = False
 self.hostname = get_hostname()
 if executor:
 self.executor = executor

Review Comment:
   I'll cut a good-first-issue for this :+1:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update version added field in config.yml [airflow]

2024-04-08 Thread via GitHub


ephraimbuddy merged PR #38840:
URL: https://github.com/apache/airflow/pull/38840


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Chart: Default Airflow version to 2.9.0 [airflow]

2024-04-08 Thread via GitHub


ephraimbuddy commented on code in PR #38839:
URL: https://github.com/apache/airflow/pull/38839#discussion_r1556101539


##
chart/newsfragments/38478.significant.rst:
##


Review Comment:
   Thanks. Updated



##
chart/newsfragments/38478.significant.rst:
##


Review Comment:
   Thanks. Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deferrable mode for Custom Training Job operators [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38584:
URL: https://github.com/apache/airflow/pull/38584#discussion_r1556100397


##
airflow/providers/google/cloud/operators/vertex_ai/custom_job.py:
##
@@ -539,6 +564,94 @@ def on_kill(self) -> None:
 if self.hook:
 self.hook.cancel_job()
 
+def execute_complete(self, context: Context, event: dict[str, Any]) -> 
dict[str, Any] | None:
+if event["status"] == "error":
+raise AirflowException(event["message"])
+result = event["job"]
+model_id = self.hook.extract_model_id_from_training_pipeline(result)
+custom_job_id = 
self.hook.extract_custom_job_id_from_training_pipeline(result)
+self.xcom_push(context, key="model_id", value=model_id)
+VertexAIModelLink.persist(context=context, task_instance=self, 
model_id=model_id)
+# push custom_job_id to xcom so it could be pulled by other tasks
+self.xcom_push(context, key="custom_job_id", value=custom_job_id)
+return result
+
+def invoke_defer(self, context: Context) -> None:
+custom_container_training_job_obj: CustomContainerTrainingJob = 
self.hook.submit_custom_container_training_job(
+project_id=self.project_id,
+region=self.region,
+display_name=self.display_name,
+command=self.command,
+container_uri=self.container_uri,
+
model_serving_container_image_uri=self.model_serving_container_image_uri,
+
model_serving_container_predict_route=self.model_serving_container_predict_route,
+
model_serving_container_health_route=self.model_serving_container_health_route,
+
model_serving_container_command=self.model_serving_container_command,
+model_serving_container_args=self.model_serving_container_args,
+
model_serving_container_environment_variables=self.model_serving_container_environment_variables,
+model_serving_container_ports=self.model_serving_container_ports,
+model_description=self.model_description,
+model_instance_schema_uri=self.model_instance_schema_uri,
+model_parameters_schema_uri=self.model_parameters_schema_uri,
+model_prediction_schema_uri=self.model_prediction_schema_uri,
+parent_model=self.parent_model,
+is_default_version=self.is_default_version,
+model_version_aliases=self.model_version_aliases,
+model_version_description=self.model_version_description,
+labels=self.labels,
+
training_encryption_spec_key_name=self.training_encryption_spec_key_name,
+model_encryption_spec_key_name=self.model_encryption_spec_key_name,
+staging_bucket=self.staging_bucket,
+# RUN
+dataset=Dataset(name=self.dataset_id) if self.dataset_id else None,
+annotation_schema_uri=self.annotation_schema_uri,
+model_display_name=self.model_display_name,
+model_labels=self.model_labels,
+base_output_dir=self.base_output_dir,
+service_account=self.service_account,
+network=self.network,
+bigquery_destination=self.bigquery_destination,
+args=self.args,
+environment_variables=self.environment_variables,
+replica_count=self.replica_count,
+machine_type=self.machine_type,
+accelerator_type=self.accelerator_type,
+accelerator_count=self.accelerator_count,
+boot_disk_type=self.boot_disk_type,
+boot_disk_size_gb=self.boot_disk_size_gb,
+training_fraction_split=self.training_fraction_split,
+validation_fraction_split=self.validation_fraction_split,
+test_fraction_split=self.test_fraction_split,
+training_filter_split=self.training_filter_split,
+validation_filter_split=self.validation_filter_split,
+test_filter_split=self.test_filter_split,
+predefined_split_column_name=self.predefined_split_column_name,
+timestamp_split_column_name=self.timestamp_split_column_name,
+tensorboard=self.tensorboard,
+)
+custom_container_training_job_obj.wait_for_resource_creation()

Review Comment:
   @Lee-W ? What do you think?  Left the other comment for you to resolve



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Deferrable mode for Custom Training Job operators [airflow]

2024-04-08 Thread via GitHub


potiuk commented on code in PR #38584:
URL: https://github.com/apache/airflow/pull/38584#discussion_r1556098638


##
airflow/providers/google/cloud/operators/vertex_ai/custom_job.py:
##
@@ -539,6 +564,94 @@ def on_kill(self) -> None:
 if self.hook:
 self.hook.cancel_job()
 
+def execute_complete(self, context: Context, event: dict[str, Any]) -> 
dict[str, Any] | None:
+if event["status"] == "error":
+raise AirflowException(event["message"])
+result = event["job"]
+model_id = self.hook.extract_model_id_from_training_pipeline(result)
+custom_job_id = 
self.hook.extract_custom_job_id_from_training_pipeline(result)
+self.xcom_push(context, key="model_id", value=model_id)
+VertexAIModelLink.persist(context=context, task_instance=self, 
model_id=model_id)
+# push custom_job_id to xcom so it could be pulled by other tasks
+self.xcom_push(context, key="custom_job_id", value=custom_job_id)
+return result
+
+def invoke_defer(self, context: Context) -> None:
+custom_container_training_job_obj: CustomContainerTrainingJob = 
self.hook.submit_custom_container_training_job(
+project_id=self.project_id,
+region=self.region,
+display_name=self.display_name,
+command=self.command,
+container_uri=self.container_uri,
+
model_serving_container_image_uri=self.model_serving_container_image_uri,
+
model_serving_container_predict_route=self.model_serving_container_predict_route,
+
model_serving_container_health_route=self.model_serving_container_health_route,
+
model_serving_container_command=self.model_serving_container_command,
+model_serving_container_args=self.model_serving_container_args,
+
model_serving_container_environment_variables=self.model_serving_container_environment_variables,
+model_serving_container_ports=self.model_serving_container_ports,
+model_description=self.model_description,
+model_instance_schema_uri=self.model_instance_schema_uri,
+model_parameters_schema_uri=self.model_parameters_schema_uri,
+model_prediction_schema_uri=self.model_prediction_schema_uri,
+parent_model=self.parent_model,
+is_default_version=self.is_default_version,
+model_version_aliases=self.model_version_aliases,
+model_version_description=self.model_version_description,
+labels=self.labels,
+
training_encryption_spec_key_name=self.training_encryption_spec_key_name,
+model_encryption_spec_key_name=self.model_encryption_spec_key_name,
+staging_bucket=self.staging_bucket,
+# RUN
+dataset=Dataset(name=self.dataset_id) if self.dataset_id else None,
+annotation_schema_uri=self.annotation_schema_uri,
+model_display_name=self.model_display_name,
+model_labels=self.model_labels,
+base_output_dir=self.base_output_dir,
+service_account=self.service_account,
+network=self.network,
+bigquery_destination=self.bigquery_destination,
+args=self.args,
+environment_variables=self.environment_variables,
+replica_count=self.replica_count,
+machine_type=self.machine_type,
+accelerator_type=self.accelerator_type,
+accelerator_count=self.accelerator_count,
+boot_disk_type=self.boot_disk_type,
+boot_disk_size_gb=self.boot_disk_size_gb,
+training_fraction_split=self.training_fraction_split,
+validation_fraction_split=self.validation_fraction_split,
+test_fraction_split=self.test_fraction_split,
+training_filter_split=self.training_filter_split,
+validation_filter_split=self.validation_filter_split,
+test_filter_split=self.test_filter_split,
+predefined_split_column_name=self.predefined_split_column_name,
+timestamp_split_column_name=self.timestamp_split_column_name,
+tensorboard=self.tensorboard,
+)
+custom_container_training_job_obj.wait_for_resource_creation()

Review Comment:
   Ah. yeah. I just saw more context now, then yes - absolutely no problem with 
it tin this case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Remove decorator from rendering fields example [airflow]

2024-04-08 Thread via GitHub


potiuk merged PR #38827:
URL: https://github.com/apache/airflow/pull/38827


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated: Remove decorator from rendering fields example (#38827)

2024-04-08 Thread potiuk
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
 new 285f037dbc Remove decorator from rendering fields example (#38827)
285f037dbc is described below

commit 285f037dbcc0f5e23c2d6ac99bfd6cab86c96ac3
Author: Elad Kalif <45845474+elad...@users.noreply.github.com>
AuthorDate: Mon Apr 8 19:09:17 2024 +0300

Remove decorator from rendering fields example (#38827)

* Remove decorator from rendering fields example

* fix
---
 docs/apache-airflow/core-concepts/operators.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/apache-airflow/core-concepts/operators.rst 
b/docs/apache-airflow/core-concepts/operators.rst
index c7193789bc..0330ef43fb 100644
--- a/docs/apache-airflow/core-concepts/operators.rst
+++ b/docs/apache-airflow/core-concepts/operators.rst
@@ -246,9 +246,9 @@ you can pass ``render_template_as_native_obj=True`` to the 
DAG as follows:
 return json.loads(data_string)
 
 
-@task(task_id="transform")
 def transform(order_data):
 print(type(order_data))
+total_order_value = 0
 for value in order_data.values():
 total_order_value += value
 return {"total_order_value": total_order_value}



Re: [PR] Update ACL during job reset [airflow]

2024-04-08 Thread via GitHub


subham611 commented on code in PR #38741:
URL: https://github.com/apache/airflow/pull/38741#discussion_r1556066734


##
airflow/providers/databricks/hooks/databricks.py:
##
@@ -655,6 +656,16 @@ def get_repo_by_path(self, path: str) -> str | None:
 
 return None
 
+def update_job_permission(self, json: dict[str, Any]) -> dict:
+"""
+Update databricks job permission
+
+:param json: payload
+:return:

Review Comment:
   added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Add Weights and Biases Provider [airflow]

2024-04-08 Thread via GitHub


odaneau-astro closed pull request #38846: Add Weights and Biases Provider
URL: https://github.com/apache/airflow/pull/38846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


RNHTTR commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1556051546


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   > Does it even need to be API paremeters, or could we do what S3 does, and 
do this as an HTTP Range request?
   
   This is currently the plan



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Correctly select task in DAG Graph View when clicking on its name [airflow]

2024-04-08 Thread via GitHub


bbovenzi commented on code in PR #38782:
URL: https://github.com/apache/airflow/pull/38782#discussion_r1556047065


##
airflow/www/static/js/dag/details/graph/DagNode.tsx:
##
@@ -126,10 +125,6 @@ const DagNode = ({
   label={taskName}
   isOpen={isOpen}
   isGroup={!!childCount}
-  onClick={(e) => {
-e.stopPropagation();

Review Comment:
   We need this for opening and closing task groups. But we can wrap it in an 
if statement `if (!!onToggleCollapse) {...` and have both function inside. 
Therefore, the onclick will propagate to selecting the task if it is not a task 
group



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] cannot start 'airflow standalone'; pendulum.tz.timezone(tz) -> TypeError [airflow]

2024-04-08 Thread via GitHub


dhshah13 commented on issue #38714:
URL: https://github.com/apache/airflow/issues/38714#issuecomment-2043070747

   and degrade python version to 3.11- it is more stable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [2.9.0] DB migrate throws na error on encrypt_trigger_kwargs [airflow]

2024-04-08 Thread via GitHub


potiuk commented on issue #38836:
URL: https://github.com/apache/airflow/issues/38836#issuecomment-2043068257

   cc: @hussein-awala 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(airflow) branch main updated (87fc581262 -> bfaa4f2012)

2024-04-08 Thread vincbeck
This is an automated email from the ASF dual-hosted git repository.

vincbeck pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


from 87fc581262 Remove false negative SparkInK8s internal deprecations 
(#38777)
 add bfaa4f2012 fix: COMMAND string should be raw to avoid SyntaxWarning: 
invalid escape sequence '\s' (#38734)

No new revisions were added by this update.

Summary of changes:
 airflow/providers/amazon/aws/hooks/eks.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



Re: [PR] Fix eks.py SyntaxWarning: invalid esape sequence '\s' [airflow]

2024-04-08 Thread via GitHub


vincbeck merged PR #38734:
URL: https://github.com/apache/airflow/pull/38734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]

2024-04-08 Thread via GitHub


vincbeck commented on code in PR #38693:
URL: https://github.com/apache/airflow/pull/38693#discussion_r1556043966


##
airflow/providers/amazon/aws/sensors/bedrock.py:
##
@@ -0,0 +1,111 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Any, Sequence
+
+from airflow.configuration import conf
+from airflow.providers.amazon.aws.sensors.base_aws import AwsBaseSensor
+from airflow.providers.amazon.aws.triggers.bedrock import 
BedrockCustomizeModelCompletedTrigger
+from airflow.providers.amazon.aws.utils.mixins import aws_template_fields
+
+if TYPE_CHECKING:
+from airflow.utils.context import Context
+
+from airflow.exceptions import AirflowException, AirflowSkipException
+from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook
+
+
+class BedrockCustomizeModelCompletedSensor(AwsBaseSensor[BedrockHook]):
+"""
+Poll the state of the model customization job until it reaches a terminal 
state; fails if the job fails.
+
+.. seealso::
+For more information on how to use this sensor, take a look at the 
guide:
+:ref:`howto/sensor:BedrockCustomizeModelCompletedSensor`
+
+
+:param job_name: The name of the Bedrock model customization job.
+
+:param deferrable: If True, the sensor will operate in deferrable mode. 
This mode requires aiobotocore
+module to be installed.
+(default: False, but can be overridden in config file by setting 
default_deferrable to True)
+:param max_retries: Number of times before returning the current state. 
(default: 75)
+:param poke_interval: Polling period in seconds to check for the status of 
the job. (default: 120)
+:param aws_conn_id: The Airflow connection used for AWS credentials.
+If this is ``None`` or empty then the default boto3 behaviour is used. 
If
+running Airflow in a distributed manner and aws_conn_id is None or
+empty, then default boto3 configuration would be used (and must be
+maintained on each worker node).
+:param region_name: AWS region_name. If not specified then the default 
boto3 behaviour is used.
+:param verify: Whether or not to verify SSL certificates. See:
+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
+:param botocore_config: Configuration dictionary (key-values) for botocore 
client. See:
+
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
+"""
+
+INTERMEDIATE_STATES = ("InProgress",)
+FAILURE_STATES = ("Failed", "Stopping", "Stopped")
+SUCCESS_STATES = ("Completed",)
+FAILURE_MESSAGE = "Bedrock model customization job sensor failed."
+
+aws_hook_class = BedrockHook
+template_fields: Sequence[str] = aws_template_fields("job_name")
+ui_color = "#66c3ff"
+
+def __init__(
+self,
+*,
+job_name: str,
+max_retries: int = 75,
+poke_interval: int = 120,
+deferrable: bool = conf.getboolean("operators", "default_deferrable", 
fallback=False),
+**kwargs: Any,
+) -> None:
+super().__init__(**kwargs)
+self.job_name = job_name
+self.poke_interval = poke_interval
+self.max_retries = max_retries
+self.deferrable = deferrable
+
+def execute(self, context: Context) -> Any:
+if self.deferrable:
+self.defer(
+trigger=BedrockCustomizeModelCompletedTrigger(
+job_name=self.job_name,
+waiter_delay=int(self.poke_interval),
+waiter_max_attempts=self.max_retries,
+aws_conn_id=self.aws_conn_id,
+),
+method_name="poke",
+)
+else:
+super().execute(context=context)
+
+def poke(self, context: Context) -> bool:
+state = self.hook.get_customize_model_job_state(self.job_name)
+
+if state in self.FAILURE_STATES:
+# TODO: remove this if block when min_airflow_version is set to 
higher than 2.7.1
+if self.soft_fail:
+raise AirflowSkipException(self.FAILURE_MESSAGE)
+raise 

Re: [PR] fix: update `get_uri` to return URI in sqlAlchemy URI format [airflow]

2024-04-08 Thread via GitHub


rawwar commented on PR #38831:
URL: https://github.com/apache/airflow/pull/38831#issuecomment-2043040070

   @Taragolis , based on your comment 
[here](https://github.com/apache/airflow/issues/38195#issuecomment-2042644747)
   
   > As far as i remember, we recently discuss this case in slack. I guess 
better introduce sa_uri property (or some similar naming) which returns 
[sqlalchemy.engine.URL](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.engine.URL)
 into the DbApiHook without any implementation, e.g. raise NotImplementedError 
and implements it per hook which have implementation for SQAlchemy. 
   
   I guess we leave the implementation of `get_uri` dependent on the provider 
type. If so, do we really need to have `sa_uri` property with 
`NotImplementedError`? Since SqlAlchemy URI is pre-defined, we can consider 
using the connection and try to form the URI. Raise an error if it has issues. 
What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Connection to Postgres failed due to special characters in a password [airflow]

2024-04-08 Thread via GitHub


rawwar commented on issue #38187:
URL: https://github.com/apache/airflow/issues/38187#issuecomment-2043038768

   @Taragolis , based on your comment 
[here](https://github.com/apache/airflow/issues/38195#issuecomment-2042644747)
   
   > As far as i remember, we recently discuss this case in slack. I guess 
better introduce sa_uri property (or some similar naming) which returns 
[sqlalchemy.engine.URL](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.engine.URL)
 into the DbApiHook without any implementation, e.g. raise NotImplementedError 
and implements it per hook which have implementation for SQAlchemy. 
   
   I guess we leave the implementation of `get_uri` dependent on the provider 
type. If so, do we really need to have `sa_uri` property with 
`NotImplementedError`? Since SqlAlchemy URI is pre-defined, we can consider 
using the connection and try to form the URI. Raise an error if it has issues. 
What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] Change the UI color for ExternalSensor tasks [airflow]

2024-04-08 Thread via GitHub


Timelessprod opened a new issue, #38845:
URL: https://github.com/apache/airflow/issues/38845

   ### Description
   
   With the recent update of the graph display in DAGs, there's a visibility 
issue regarding external task sensors. Indeed those tasks have a `#19647e` tile 
color while the status & operator name are displayed in 
`--chakra-colors-gray-500` (aka `#718096`) on top of it, which makes the text 
unreadable. You can see a screenshot below:
   
   https://github.com/apache/airflow/assets/69579602/8497faa7-9a61-4e59-a0ff-576844cfe67f;>
   
   I can submit a PR for it if needed but I'm unsure if there's a chart 
regarding which color to give to what type of operator etc. so please tell me. 
My idea would be to use same tint but lighter color like `#4db7db` which is 
more readable (see below) but not perfect for accessibility:
   
   https://github.com/apache/airflow/assets/69579602/e2fa562b-668f-4756-9ef0-421d5da3280e;>
   
   
   ### Use case/motivation
   
   The current color makes monitoring and debugging very annoying as it's 
unreadable. It is also a big issue regarding accessibility as color-blind 
people may not be able to guess the status name by the status color. As an 
argument, it was red flagged by Chrome Lighthouse when I ran a test on the 
webpage.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Change the UI color for ExternalSensor tasks [airflow]

2024-04-08 Thread via GitHub


boring-cyborg[bot] commented on issue #38845:
URL: https://github.com/apache/airflow/issues/38845#issuecomment-2043032243

   Thanks for opening your first issue here! Be sure to follow the issue 
template! If you are willing to raise PR to address this issue please do so, no 
need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] White INFO log is hard to read [airflow]

2024-04-08 Thread via GitHub


romanzdk opened a new issue, #38844:
URL: https://github.com/apache/airflow/issues/38844

   ### Apache Airflow version
   
   main (development)
   
   ### If "Other Airflow 2 version" selected, which one?
   
   2.9.0
   
   ### What happened?
   
   White INFO log is hard to read
   
   
![image](https://github.com/apache/airflow/assets/31843161/aff517de-6a21-4043-b1a3-cd13156c9682)
   
   
   ### What you think should happen instead?
   
   use normal black font color
   
   ### How to reproduce
   
   use airflow 2.9.0 with some KubernetesPodOperator outputting random INFO logs
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==8.19.0
   apache-airflow-providers-celery==3.6.1
   apache-airflow-providers-cncf-kubernetes==8.0.1
   apache-airflow-providers-common-io==1.3.0
   apache-airflow-providers-common-sql==1.11.1
   apache-airflow-providers-fab==1.0.2
   apache-airflow-providers-ftp==3.7.0
   apache-airflow-providers-http==4.10.0
   apache-airflow-providers-imap==3.5.0
   apache-airflow-providers-postgres==5.10.2
   apache-airflow-providers-smtp==1.6.1
   apache-airflow-providers-sqlite==3.7.1
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   celery executor, on-premise kubernetes 1.26.5, postgres 13
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Update dagrun.py to fix #38816 [airflow]

2024-04-08 Thread via GitHub


boring-cyborg[bot] commented on PR #38843:
URL: https://github.com/apache/airflow/pull/38843#issuecomment-2043013963

   Congratulations on your first Pull Request and welcome to the Apache Airflow 
community! If you have any issues or are unsure about any anything please check 
our Contributors' Guide 
(https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (ruff, mypy and type 
annotations). Our [pre-commits]( 
https://github.com/apache/airflow/blob/main/contributing-docs/08_static_code_checks.rst#prerequisites-for-pre-commit-hooks)
 will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in 
`docs/` directory). Adding a new operator? Check this short 
[guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst)
 Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze 
environment](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst)
 for testing locally, it's a heavy docker but it ships with a working Airflow 
and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get 
the final approval from Committers.
   - Please follow [ASF Code of 
Conduct](https://www.apache.org/foundation/policies/conduct) for all 
communication including (but not limited to) comments on Pull Requests, Mailing 
list and Slack.
   - Be sure to read the [Airflow Coding style]( 
https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#coding-style-and-best-practices).
   - Always keep your Pull Requests rebased, otherwise your build might fail 
due to changes not related to your commits.
   Apache Airflow is a community-driven project and together we are making it 
better .
   In case of doubts contact the developers at:
   Mailing List: d...@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Update dagrun.py to fix #38816 [airflow]

2024-04-08 Thread via GitHub


MyLong opened a new pull request, #38843:
URL: https://github.com/apache/airflow/pull/38843

   'DagRunNote' object has no attribute dag_id\dagrun_id\run_id\map_index
   so replace those attributes to existing attributes dag_run_id\content


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] OpenAI Chat & Assistant hook functions [airflow]

2024-04-08 Thread via GitHub


Lee-W commented on code in PR #38736:
URL: https://github.com/apache/airflow/pull/38736#discussion_r1556009587


##
airflow/providers/openai/hooks/openai.py:
##
@@ -77,6 +89,165 @@ def get_conn(self) -> OpenAI:
 **openai_client_kwargs,
 )
 
+def create_chat_completion(
+self,
+messages: list[
+ChatCompletionSystemMessageParam
+| ChatCompletionUserMessageParam
+| ChatCompletionAssistantMessageParam
+| ChatCompletionToolMessageParam
+| ChatCompletionFunctionMessageParam
+],
+model: str = "gpt-3.5-turbo",
+**kwargs: Any,
+) -> ChatCompletionMessage:
+"""
+Create a model response for the given chat conversation and returns 
the message.
+
+:param messages: A list of messages comprising the conversation so far
+:param model: ID of the model to use
+"""
+response = self.conn.chat.completions.create(model=model, 
messages=messages, **kwargs)
+return response.choices[0].message
+
+def create_assistant(self, model: str = "gpt-3.5-turbo", **kwargs: Any) -> 
Assistant:
+"""Create an OpenAI assistant using the given model.
+
+:param model: The OpenAI model for the assistant to use.
+"""
+assistant = self.conn.beta.assistants.create(model=model, **kwargs)
+return assistant
+
+def get_assistant(self, assistant_id: str) -> Assistant:
+"""
+Get an OpenAI assistant.
+
+:param assistant_id: The ID of the assistant to retrieve.
+"""
+assistant = 
self.conn.beta.assistants.retrieve(assistant_id=assistant_id)
+return assistant
+
+def get_assistants(self, **kwargs) -> list[Assistant]:
+"""Get a list of Assistant objects."""
+assistants = self.conn.beta.assistants.list(**kwargs)
+return assistants.data
+
+def get_assistant_with_name(self, assistant_name: str) -> Assistant | None:

Review Comment:
   ```suggestion
   def get_assistant_by_name(self, assistant_name: str) -> Assistant | None:
   ```
   
   Not sure whether it sounds more reasonable 樂 



##
airflow/providers/openai/hooks/openai.py:
##
@@ -77,6 +89,165 @@ def get_conn(self) -> OpenAI:
 **openai_client_kwargs,
 )
 
+def create_chat_completion(
+self,
+messages: list[
+ChatCompletionSystemMessageParam
+| ChatCompletionUserMessageParam
+| ChatCompletionAssistantMessageParam
+| ChatCompletionToolMessageParam
+| ChatCompletionFunctionMessageParam
+],
+model: str = "gpt-3.5-turbo",
+**kwargs: Any,
+) -> ChatCompletionMessage:
+"""
+Create a model response for the given chat conversation and returns 
the message.
+
+:param messages: A list of messages comprising the conversation so far
+:param model: ID of the model to use
+"""
+response = self.conn.chat.completions.create(model=model, 
messages=messages, **kwargs)
+return response.choices[0].message
+
+def create_assistant(self, model: str = "gpt-3.5-turbo", **kwargs: Any) -> 
Assistant:
+"""Create an OpenAI assistant using the given model.
+
+:param model: The OpenAI model for the assistant to use.
+"""
+assistant = self.conn.beta.assistants.create(model=model, **kwargs)
+return assistant
+
+def get_assistant(self, assistant_id: str) -> Assistant:
+"""
+Get an OpenAI assistant.
+
+:param assistant_id: The ID of the assistant to retrieve.
+"""
+assistant = 
self.conn.beta.assistants.retrieve(assistant_id=assistant_id)
+return assistant
+
+def get_assistants(self, **kwargs) -> list[Assistant]:

Review Comment:
   ```suggestion
   def get_assistants(self, **kwargs: Any) -> list[Assistant]:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] fix: try002 for provider google [airflow]

2024-04-08 Thread via GitHub


dondaum commented on code in PR #38803:
URL: https://github.com/apache/airflow/pull/38803#discussion_r1556001809


##
airflow/providers/google/cloud/hooks/dataflow.py:
##
@@ -252,7 +252,7 @@ def _get_current_jobs(self) -> list[dict]:
 self._job_id = jobs[0]["id"]
 return jobs
 else:
-raise Exception("Missing both dataflow job ID and name.")
+raise AirflowException("Missing both dataflow job ID and name.")

Review Comment:
   Good point. I checked again and yes in makes sense for me to rather use 
`ValueError`. Changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Applying queue with SparkSubmitOperator is ignored from 2.6.0 onwards [airflow]

2024-04-08 Thread via GitHub


pateash commented on issue #38461:
URL: https://github.com/apache/airflow/issues/38461#issuecomment-2042992467

   Checking this, can any one please assign this to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Expose AWS IAM missing param in Hashicorp secret [airflow]

2024-04-08 Thread via GitHub


Lee-W commented on code in PR #38536:
URL: https://github.com/apache/airflow/pull/38536#discussion_r1555990209


##
airflow/providers/hashicorp/provider.yaml:
##
@@ -55,6 +55,7 @@ versions:
 dependencies:
   - apache-airflow>=2.6.0
   - hvac>=1.1.0
+  - apache-airflow-providers-amazon>=8.0.0

Review Comment:
   Should (or could) we make this an extra dep for those who need to use aws?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Bugfix to correct GCSHook being called even when not required with BeamRunPythonPipelineOperator [airflow]

2024-04-08 Thread via GitHub


zstrathe commented on code in PR #38716:
URL: https://github.com/apache/airflow/pull/38716#discussion_r1555982325


##
airflow/providers/apache/beam/operators/beam.py:
##
@@ -364,11 +364,13 @@ def execute(self, context: Context):
 
 def execute_sync(self, context: Context):
 with ExitStack() as exit_stack:
-gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id)
 if self.py_file.lower().startswith("gs://"):
+gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id)
 tmp_gcs_file = 
exit_stack.enter_context(gcs_hook.provide_file(object_url=self.py_file))
 self.py_file = tmp_gcs_file.name
 if self.snake_case_pipeline_options.get("requirements_file", 
"").startswith("gs://"):
+if 'gcs_hook' not in locals():

Review Comment:
   @e-galan thanks for the feedback! I will go ahead and remove that check then.
   
   In addition, after looking through the code some more, it looks like any 
other GCS resources may not be instantiated if supplied as pipeline_options? 
For example, the below from 
tests/system/providers/apache/beam/example_python.py:
   ```
   start_python_pipeline_direct_runner = BeamRunPythonPipelineOperator(
   task_id="start_python_pipeline_direct_runner",
   py_file=GCS_PYTHON,
   py_options=[],
   pipeline_options={"output": GCS_OUTPUT},
   py_requirements=["apache-beam[gcp]==2.46.0"],
   py_interpreter="python3",
   py_system_site_packages=False,
   )
   ```
   Would  ```pipeline_options={"output": GCS_OUTPUT}``` correctly result in the 
output file being updated in GCS? 
   
   If not, I think that somehow every value in pipeline_options should be 
recursively parsed for conversion to a GCS resource, with the complication that 
"output" files would need to utilize ```GCSHook.provide_file_and_upload()``` 
instead of ```GCSHook.provide_file()```. And I think that could be solved by 
adding another file prefix to distinguish GCS resources that need to be 
uploaded (i.e., ```"gcs-upload://"```), and updating the docs to note that.
   
   If this all sounds correct, I'd be happy to add to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >