Re: [I] White INFO log is hard to read [airflow]
tirkarthi commented on issue #38844: URL: https://github.com/apache/airflow/issues/38844#issuecomment-2044202208 Does the logs returned from API have ansi code that causes this? cc: @jscheffl -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] clear a deferred task do not increment the tries [airflow]
tirkarthi commented on issue #38735: URL: https://github.com/apache/airflow/issues/38735#issuecomment-2044200109 When a task is deferred the `_try_number` is decremented so that when the trigger yields an event the task resumes execution where `_try_number` is incremented to use the same try_number before and after trigger. Perhaps when the task being cleared is in deferred state the try_number can be incremented. ```diff commit f08bc8d1326f2c7c271928d74804050c36b4e953 Author: Karthikeyan Singaravelan Date: Tue Apr 9 11:10:06 2024 +0530 Increment try_number for cleared deferred tasks. diff --git a/airflow/models/taskinstance.py b/airflow/models/taskinstance.py index d52a71c5b2..62b4d3e821 100644 --- a/airflow/models/taskinstance.py +++ b/airflow/models/taskinstance.py @@ -274,6 +274,12 @@ def clear_task_instances( ti.state = TaskInstanceState.RESTARTING job_ids.append(ti.job_id) else: +# When the task is deferred the try_number is decremented so that the same try +# number is used when the task handles the event. But in case of clearing the try +# number should be incremented so that the next run doesn't reuse the same try +if ti.state == TaskInstanceState.DEFERRED: +ti._try_number += 1 + ti_dag = dag if dag and dag.dag_id == ti.dag_id else dag_bag.get_dag(ti.dag_id, session=session) task_id = ti.task_id if ti_dag and ti_dag.has_task(task_id): ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] timeout on Dynamic Task mapping : skipped inner task , task status is still success [airflow]
raphaelauv commented on issue #37332: URL: https://github.com/apache/airflow/issues/37332#issuecomment-2044184401 @ephraimbuddy could you please remo6the tag wait for response , thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Make _get_ti compatible with RPC [airflow]
dstandish commented on code in PR #38570: URL: https://github.com/apache/airflow/pull/38570#discussion_r1556927388 ## airflow/serialization/pydantic/dag_run.py: ## @@ -78,7 +78,7 @@ def get_task_instances( def get_task_instance( self, task_id: str, -session: Session, +session: Session | None = None, Review Comment: i think this may be unnecessary. will revert and see what happens. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Make _get_ti compatible with RPC [airflow]
dstandish commented on code in PR #38570: URL: https://github.com/apache/airflow/pull/38570#discussion_r1556922160 ## airflow/cli/commands/task_command.py: ## @@ -156,8 +156,10 @@ def _get_dag_run( raise ValueError(f"unknown create_if_necessary value: {create_if_necessary!r}") +@internal_api_call @provide_session -def _get_ti( +def _get_ti_db_access( +dag: DAG, task: Operator, Review Comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Make _get_ti compatible with RPC [airflow]
dstandish commented on code in PR #38570: URL: https://github.com/apache/airflow/pull/38570#discussion_r1556918483 ## airflow/serialization/serialized_objects.py: ## @@ -1462,9 +1462,15 @@ def deserialize_dag(cls, encoded_dag: dict[str, Any]) -> SerializedDAG: v = set(v) elif k == "tasks": SerializedBaseOperator._load_operator_extra_links = cls._load_operator_extra_links - -v = {task["task_id"]: SerializedBaseOperator.deserialize_operator(task) for task in v} +tasks = {} +for obj in v: +if obj.get(Encoding.TYPE) == DAT.OP: +deser = SerializedBaseOperator.deserialize_operator(obj[Encoding.VAR]) +tasks[deser.task_id] = deser +else: # this is backcompat for pre-2.10 +tasks[obj["task_id"]] = SerializedBaseOperator.deserialize_operator(obj) Review Comment: for better or worse, default setting is to _not_ reserialize... https://github.com/apache/airflow/blob/fecc1ed8eff5192818bbe04cbbdfe9585eaab583/airflow/cli/cli_config.py#L646-L653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Unable to find Spark Support in helm chart [airflow]
boring-cyborg[bot] commented on issue #38851: URL: https://github.com/apache/airflow/issues/38851#issuecomment-2044123456 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Unable to find Spark Support in helm chart [airflow]
anandlingaraj opened a new issue, #38851: URL: https://github.com/apache/airflow/issues/38851 ### Apache Airflow version 2.9.0 ### If "Other Airflow 2 version" selected, which one? apache/airflow:2.8.3 ### What happened? When selecting connection type for Spark, its missing. ### What you think should happen instead? Expect Spark Connections to be default. ### How to reproduce Install Apche Spark in AKS/ MiniKube. Try to add Connections and select Spark. ### Operating System Docker ### Versions of Apache Airflow Providers _No response_ ### Deployment Official Apache Airflow Helm Chart ### Deployment details _No response_ ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: try002 for provider cncf kubernetes [airflow]
eladkal commented on PR #38799: URL: https://github.com/apache/airflow/pull/38799#issuecomment-2044119678 There are conflics to resolve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add four unit tests for aws/utils [airflow]
slycyberguy commented on code in PR #38820: URL: https://github.com/apache/airflow/pull/38820#discussion_r1556846390 ## tests/providers/amazon/aws/utils/test_sagemaker.py: ## Review Comment: I removed those tests. I figured they were necessary because those services were listed in issue #35442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
uranusjr commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1556841592 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: I think what Ash means is we take the Range header in the API, and forward it to the log server. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add the deferrable mode to the Dataflow sensors [airflow]
Lee-W commented on code in PR #37693: URL: https://github.com/apache/airflow/pull/37693#discussion_r1556794107 ## airflow/providers/google/cloud/sensors/dataflow.py: ## @@ -194,27 +244,72 @@ def poke(self, context: Context) -> bool: project_id=self.project_id, location=self.location, ) +return result["metrics"] if self.callback is None else self.callback(result["metrics"]) + +def execute(self, context: Context) -> Any: +"""Airflow runs this method on the worker and defers using the trigger.""" +if not self.deferrable: +super().execute(context) +else: +self.defer( +timeout=self.execution_timeout, +trigger=DataflowJobMetricsTrigger( +job_id=self.job_id, +project_id=self.project_id, +location=self.location, +gcp_conn_id=self.gcp_conn_id, +poll_sleep=self.poll_interval, +impersonation_chain=self.impersonation_chain, +fail_on_terminal_state=self.fail_on_terminal_state, +), +method_name="execute_complete", +) -return self.callback(result["metrics"]) +def execute_complete(self, context: Context, event: dict[str, str | list]) -> Any: +""" +Execute this method when the task resumes its execution on the worker after deferral. + +If the trigger returns an event with success status - passes the event result to the callback function. +Returns the event result if no callback function is provided. + +If the trigger returns an event with error status - raises an exception. +""" +if event["status"] == "success": +self.log.info(event["message"]) +return event["result"] if self.callback is None else self.callback(event["result"]) +if self.soft_fail: Review Comment: ```suggestion # TODO: remove this if block when min_airflow_version is set to higher than 2.7.1 if self.soft_fail: ``` ## airflow/providers/google/cloud/sensors/dataflow.py: ## @@ -115,10 +124,50 @@ def poke(self, context: Context) -> bool: return False +def execute(self, context: Context) -> None: +"""Airflow runs this method on the worker and defers using the trigger.""" +if not self.deferrable: +super().execute(context) +elif not self.poke(context=context): +self.defer( +timeout=self.execution_timeout, +trigger=DataflowJobStatusTrigger( +job_id=self.job_id, +expected_statuses=self.expected_statuses, +project_id=self.project_id, +location=self.location, +gcp_conn_id=self.gcp_conn_id, +poll_sleep=self.poll_interval, +impersonation_chain=self.impersonation_chain, +), +method_name="execute_complete", +) + +def execute_complete(self, context: Context, event: dict[str, str | list]) -> bool: +""" +Execute this method when the task resumes its execution on the worker after deferral. + +Returns True if the trigger returns an event with the success status, otherwise raises +an exception. +""" +if event["status"] == "success": +self.log.info(event["message"]) +return True +if self.soft_fail: Review Comment: ```suggestion # TODO: remove this if block when min_airflow_version is set to higher than 2.7.1 if self.soft_fail: ``` ## airflow/providers/google/cloud/sensors/dataflow.py: ## @@ -357,4 +498,45 @@ def poke(self, context: Context) -> bool: location=self.location, ) -return self.callback(result) +return result if self.callback is None else self.callback(result) + +def execute(self, context: Context) -> Any: +"""Airflow runs this method on the worker and defers using the trigger.""" +if not self.deferrable: +super().execute(context) +else: +self.defer( +trigger=DataflowJobAutoScalingEventTrigger( +job_id=self.job_id, +project_id=self.project_id, +location=self.location, +gcp_conn_id=self.gcp_conn_id, +poll_sleep=self.poll_interval, +impersonation_chain=self.impersonation_chain, +fail_on_terminal_state=self.fail_on_terminal_state, +), +method_name="execute_complete", +) + +def execute_complete(self, context: Context, event: dict[str, str | list]) -> Any: +""" +Execute this
Re: [PR] Always use the executemany method when inserting rows in DbApiHook as it's way much faster [airflow]
uranusjr commented on code in PR #38715: URL: https://github.com/apache/airflow/pull/38715#discussion_r1556827871 ## airflow/providers/common/sql/hooks/sql.py: ## @@ -166,6 +175,7 @@ def __init__(self, *args, schema: str | None = None, log_sql: bool = True, **kwa # Hook deriving from the DBApiHook to still have access to the field in its constructor self.__schema = schema self.log_sql = log_sql +self._fast_executemany = fast_executemany Review Comment: Should this also be set on the class instead? (`supports_fast_executemany`?) Or does this depend on some information that’s only available at runtime? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add the deferrable mode to the Dataflow sensors [airflow]
Lee-W commented on code in PR #37693: URL: https://github.com/apache/airflow/pull/37693#discussion_r1556763479 ## airflow/providers/google/cloud/sensors/dataflow.py: ## @@ -151,12 +205,14 @@ def __init__( self, *, job_id: str, -callback: Callable[[dict], bool], +callback: Callable | None = None, Review Comment: Got it. If it can be arbitrary Callable, this change makes sense. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deferrable mode for Custom Training Job operators [airflow]
Lee-W commented on code in PR #38584: URL: https://github.com/apache/airflow/pull/38584#discussion_r1556745491 ## airflow/providers/google/cloud/operators/vertex_ai/custom_job.py: ## @@ -539,6 +564,94 @@ def on_kill(self) -> None: if self.hook: self.hook.cancel_job() +def execute_complete(self, context: Context, event: dict[str, Any]) -> dict[str, Any] | None: +if event["status"] == "error": +raise AirflowException(event["message"]) +result = event["job"] +model_id = self.hook.extract_model_id_from_training_pipeline(result) +custom_job_id = self.hook.extract_custom_job_id_from_training_pipeline(result) +self.xcom_push(context, key="model_id", value=model_id) +VertexAIModelLink.persist(context=context, task_instance=self, model_id=model_id) +# push custom_job_id to xcom so it could be pulled by other tasks +self.xcom_push(context, key="custom_job_id", value=custom_job_id) +return result + +def invoke_defer(self, context: Context) -> None: +custom_container_training_job_obj: CustomContainerTrainingJob = self.hook.submit_custom_container_training_job( +project_id=self.project_id, +region=self.region, +display_name=self.display_name, +command=self.command, +container_uri=self.container_uri, + model_serving_container_image_uri=self.model_serving_container_image_uri, + model_serving_container_predict_route=self.model_serving_container_predict_route, + model_serving_container_health_route=self.model_serving_container_health_route, + model_serving_container_command=self.model_serving_container_command, +model_serving_container_args=self.model_serving_container_args, + model_serving_container_environment_variables=self.model_serving_container_environment_variables, +model_serving_container_ports=self.model_serving_container_ports, +model_description=self.model_description, +model_instance_schema_uri=self.model_instance_schema_uri, +model_parameters_schema_uri=self.model_parameters_schema_uri, +model_prediction_schema_uri=self.model_prediction_schema_uri, +parent_model=self.parent_model, +is_default_version=self.is_default_version, +model_version_aliases=self.model_version_aliases, +model_version_description=self.model_version_description, +labels=self.labels, + training_encryption_spec_key_name=self.training_encryption_spec_key_name, +model_encryption_spec_key_name=self.model_encryption_spec_key_name, +staging_bucket=self.staging_bucket, +# RUN +dataset=Dataset(name=self.dataset_id) if self.dataset_id else None, +annotation_schema_uri=self.annotation_schema_uri, +model_display_name=self.model_display_name, +model_labels=self.model_labels, +base_output_dir=self.base_output_dir, +service_account=self.service_account, +network=self.network, +bigquery_destination=self.bigquery_destination, +args=self.args, +environment_variables=self.environment_variables, +replica_count=self.replica_count, +machine_type=self.machine_type, +accelerator_type=self.accelerator_type, +accelerator_count=self.accelerator_count, +boot_disk_type=self.boot_disk_type, +boot_disk_size_gb=self.boot_disk_size_gb, +training_fraction_split=self.training_fraction_split, +validation_fraction_split=self.validation_fraction_split, +test_fraction_split=self.test_fraction_split, +training_filter_split=self.training_filter_split, +validation_filter_split=self.validation_filter_split, +test_filter_split=self.test_filter_split, +predefined_split_column_name=self.predefined_split_column_name, +timestamp_split_column_name=self.timestamp_split_column_name, +tensorboard=self.tensorboard, +) +custom_container_training_job_obj.wait_for_resource_creation() Review Comment: Yep, sounds good! ## tests/providers/google/cloud/operators/test_vertex_ai.py: ## @@ -88,11 +88,18 @@ ListPipelineJobOperator, RunPipelineJobOperator, ) -from airflow.providers.google.cloud.triggers.vertex_ai import RunPipelineJobTrigger +from airflow.providers.google.cloud.triggers.vertex_ai import ( +CustomContainerTrainingJobTrigger, +CustomPythonPackageTrainingJobTrigger, +CustomTrainingJobTrigger, +RunPipelineJobTrigger, +) from airflow.utils import timezone VERTEX_AI_PATH = "airflow.providers.google.cloud.operators.vertex_ai.{}" VERTEX_AI_LINKS_PATH =
Re: [PR] Fix AttributeError: 'DagRunNote' object has no attribute 'dag_id' [airflow]
MyLong commented on code in PR #38843: URL: https://github.com/apache/airflow/pull/38843#discussion_r1556746373 ## airflow/models/dagrun.py: ## @@ -1648,7 +1648,5 @@ def __init__(self, content, user_id=None): self.user_id = user_id def __repr__(self): -prefix = f"<{self.__class__.__name__}: {self.dag_id}.{self.dagrun_id} {self.run_id}" -if self.map_index != -1: -prefix += f" map_index={self.map_index}" +prefix = f"<{self.__class__.__name__}: {self.dag_run_id}.{self.content}" Review Comment: ok,I will try it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] timeout on Dynamic Task mapping : skipped inner task , task status is still success [airflow]
github-actions[bot] commented on issue #37332: URL: https://github.com/apache/airflow/issues/37332#issuecomment-2043917725 This issue has been automatically marked as stale because it has been open for 14 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Make ImapHook mail body encoding configurable [airflow]
github-actions[bot] commented on PR #37517: URL: https://github.com/apache/airflow/pull/37517#issuecomment-2043917704 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]
don1uppa commented on issue #38848: URL: https://github.com/apache/airflow/issues/38848#issuecomment-2043872862 Thank you the literal option worked. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix AttributeError: 'DagRunNote' object has no attribute 'dag_id' [airflow]
hussein-awala commented on code in PR #38843: URL: https://github.com/apache/airflow/pull/38843#discussion_r1556607661 ## airflow/models/dagrun.py: ## @@ -1648,7 +1648,5 @@ def __init__(self, content, user_id=None): self.user_id = user_id def __repr__(self): -prefix = f"<{self.__class__.__name__}: {self.dag_id}.{self.dagrun_id} {self.run_id}" -if self.map_index != -1: -prefix += f" map_index={self.map_index}" +prefix = f"<{self.__class__.__name__}: {self.dag_run_id}.{self.content}" Review Comment: This information is not sufficient, we can use the `dag_run` relationship to fetch the dag_id and the run_id. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Amazon Bedrock - Model Throughput Provisioning [airflow]
ferruzzi commented on code in PR #38850: URL: https://github.com/apache/airflow/pull/38850#discussion_r1556503642 ## airflow/providers/amazon/aws/operators/bedrock.py: ## @@ -25,7 +25,10 @@ from airflow.exceptions import AirflowException Review Comment: This comment is just a safety gate to make sure this PR doesn't get merged out of order; anyone can feel free to resolve it once https://github.com/apache/airflow/pull/38849 is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Amazon Bedrock - Model Throughput Provisioning [airflow]
ferruzzi commented on code in PR #38850: URL: https://github.com/apache/airflow/pull/38850#discussion_r1556503642 ## airflow/providers/amazon/aws/operators/bedrock.py: ## @@ -25,7 +25,10 @@ from airflow.exceptions import AirflowException Review Comment: This comment is just a safety gate to make sure this one doesn't get merged out of order; anyone can feel free to resolve it once https://github.com/apache/airflow/pull/38849 is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Amazon Bedrock - Model Throughput Provisioning [airflow]
ferruzzi opened a new pull request, #38850: URL: https://github.com/apache/airflow/pull/38850 Adds support for provisioning model throughput on Amazon Bedrock including: operators, sensors, waiters, and triggerer, as well as unit tests and system tests for all of the above, and doc updates. Manually tested in Breeze with the Trigger and with the Sensor: ![image](https://github.com/apache/airflow/assets/1920178/97573595-935a-43e9-a0ad-be223d6d53b9) Sits on top of a change in https://github.com/apache/airflow/pull/38849, please make sure that is merged before this one. --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Amazon Bedrock - Clean up hook unit tests [airflow]
ferruzzi opened a new pull request, #38849: URL: https://github.com/apache/airflow/pull/38849 @vincbeck and @Taragolis requested some changes to the Bedrock hooks in https://github.com/apache/airflow/pull/38693 and I missed the opportunity to greatly simplify the related unit tests. Static checks for it are currently failing locally but it looks like an issue with `tests/providers/common/sql/test_utils.py:45:` and it looks like @Taragolis and @dondaum are maybe working on that? --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Default credentials cause warnings on Google Chrome after logging in to Airflow's UI [airflow]
Taragolis commented on issue #38797: URL: https://github.com/apache/airflow/issues/38797#issuecomment-2043738184 > but unfortunately, Chrome is not an open source, so I'm not sure what the address is in this case. AFAIK, Chrome bugs could be reported/tracked on Chromium bug tracker: https://issues.chromium.org/issues There is also some close related bugs already opened: https://issues.chromium.org/issues?q=status:open%20data%20breach -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow-site) branch gh-pages updated (613224189f -> 21f20bf658)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch gh-pages in repository https://gitbox.apache.org/repos/asf/airflow-site.git discard 613224189f Rewritten history to remove past gh-pages deployments new 21f20bf658 Rewritten history to remove past gh-pages deployments This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (613224189f) \ N -- N -- N refs/heads/gh-pages (21f20bf658) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: blog/airflow-1.10.10/index.html| 4 +- blog/airflow-1.10.12/index.html| 4 +- blog/airflow-1.10.8-1.10.9/index.html | 4 +- blog/airflow-2.2.0/index.html | 4 +- blog/airflow-2.3.0/index.html | 4 +- blog/airflow-2.4.0/index.html | 4 +- blog/airflow-2.5.0/index.html | 4 +- blog/airflow-2.6.0/index.html | 4 +- blog/airflow-2.7.0/index.html | 4 +- blog/airflow-2.8.0/index.html | 4 +- .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes blog/airflow-2.9.0/index.html | 6 +- blog/airflow-survey-2020/index.html| 4 +- blog/airflow-survey-2022/index.html| 4 +- blog/airflow-survey/index.html | 4 +- blog/airflow-two-point-oh-is-here/index.html | 4 +- blog/airflow_summit_2021/index.html| 4 +- blog/airflow_summit_2022/index.html| 4 +- blog/announcing-new-website/index.html | 4 +- blog/apache-airflow-for-newcomers/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- blog/fab-oid-vulnerability/index.html | 4 +- .../index.html | 4 +- blog/index.xml | 2 +- blog/introducing_setup_teardown/index.html | 4 +- .../index.html | 4 +- blog/tags/release/index.xml| 2 +- index.xml | 2 +- search/index.html | 4 +- sitemap.xml| 136 ++--- use-cases/adobe/index.html | 4 +- use-cases/adyen/index.html | 4 +- use-cases/big-fish-games/index.html| 4 +- use-cases/business_operations/index.html | 4 +- use-cases/dish/index.html | 4 +- use-cases/etl_analytics/index.html | 4 +- use-cases/experity/index.html | 4 +- use-cases/infrastructure-management/index.html | 4 +- use-cases/mlops/index.html | 4 +- use-cases/onefootball/index.html | 4 +- use-cases/plarium-krasnodar/index.html | 4 +- use-cases/seniorlink/index.html| 4 +- use-cases/sift/index.html | 4 +- use-cases/snapp/index.html | 4 +- use-cases/suse/index.html | 4 +- 48 files changed, 158 insertions(+), 158 deletions(-)
(airflow) branch constraints-main updated: Updating constraints. Github run id:8606550300
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch constraints-main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/constraints-main by this push: new 2c0a5b33b8 Updating constraints. Github run id:8606550300 2c0a5b33b8 is described below commit 2c0a5b33b8dc9120ffd2cd22074266ad85f2f157 Author: Automated GitHub Actions commit AuthorDate: Mon Apr 8 21:50:54 2024 + Updating constraints. Github run id:8606550300 This update in constraints is automatically committed by the CI 'constraints-push' step based on 'refs/heads/main' in the 'apache/airflow' repository with commit sha 94153d70ac894d7c5249d183304646995d5df3e4. The action that build those constraints can be found at https://github.com/apache/airflow/actions/runs/8606550300/ The image tag used for that build was: 94153d70ac894d7c5249d183304646995d5df3e4. You can enter Breeze environment with this image by running 'breeze shell --image-tag 94153d70ac894d7c5249d183304646995d5df3e4' All tests passed in this build so we determined we can push the updated constraints. See https://github.com/apache/airflow/blob/main/README.md#installing-from-pypi for details. --- constraints-3.10.txt | 10 +- constraints-3.11.txt | 10 +- constraints-3.12.txt | 8 constraints-3.8.txt | 10 +- constraints-3.9.txt | 10 +- constraints-no-providers-3.10.txt | 5 - constraints-no-providers-3.11.txt | 5 - constraints-no-providers-3.12.txt | 5 - constraints-no-providers-3.8.txt | 5 - constraints-no-providers-3.9.txt | 5 - constraints-source-providers-3.10.txt | 10 +- constraints-source-providers-3.11.txt | 10 +- constraints-source-providers-3.12.txt | 8 constraints-source-providers-3.8.txt | 10 +- constraints-source-providers-3.9.txt | 10 +- 15 files changed, 68 insertions(+), 53 deletions(-) diff --git a/constraints-3.10.txt b/constraints-3.10.txt index a8fb9bd639..0a69f53ef5 100644 --- a/constraints-3.10.txt +++ b/constraints-3.10.txt @@ -1,6 +1,6 @@ # -# This constraints file was automatically generated on 2024-04-07T16:34:04.830710 +# This constraints file was automatically generated on 2024-04-08T21:21:58.044404 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -182,7 +182,7 @@ apache-airflow-providers-vertica==3.7.1 apache-airflow-providers-weaviate==1.3.3 apache-airflow-providers-yandex==3.9.0 apache-airflow-providers-zendesk==4.6.0 -apache-beam==2.55.0 +apache-beam==2.55.1 apispec==6.6.0 apprise==1.7.5 argcomplete==3.2.3 @@ -291,7 +291,7 @@ eralchemy2==1.3.8 et-xmlfile==1.1.0 eventlet==0.36.1 exceptiongroup==1.2.0 -execnet==2.1.0 +execnet==2.1.1 executing==2.0.1 facebook_business==19.0.2 fastavro==1.9.4 @@ -316,7 +316,7 @@ google-api-python-client==2.125.0 google-auth-httplib2==0.2.0 google-auth-oauthlib==1.2.0 google-auth==2.29.0 -google-cloud-aiplatform==1.46.0 +google-cloud-aiplatform==1.47.0 google-cloud-appengine-logging==1.4.3 google-cloud-audit-log==0.2.5 google-cloud-automl==2.13.3 @@ -691,7 +691,7 @@ types-certifi==2021.10.8.3 types-croniter==2.0.0.20240321 types-docutils==0.20.0.20240406 types-paramiko==3.4.0.20240311 -types-protobuf==4.24.0.20240311 +types-protobuf==4.24.0.20240408 types-pyOpenSSL==24.0.0.20240311 types-python-dateutil==2.9.0.20240316 types-python-slugify==8.0.2.20240310 diff --git a/constraints-3.11.txt b/constraints-3.11.txt index de5cb9bd4e..434d455fe5 100644 --- a/constraints-3.11.txt +++ b/constraints-3.11.txt @@ -1,6 +1,6 @@ # -# This constraints file was automatically generated on 2024-04-07T16:34:04.961048 +# This constraints file was automatically generated on 2024-04-08T21:21:58.301866 # via "eager-upgrade" mechanism of PIP. For the "main" branch of Airflow. # This variant of constraints install uses the HEAD of the branch version for 'apache-airflow' but installs # the providers from PIP-released packages at the moment of the constraint generation. @@ -182,7 +182,7 @@ apache-airflow-providers-vertica==3.7.1 apache-airflow-providers-weaviate==1.3.3 apache-airflow-providers-yandex==3.9.0 apache-airflow-providers-zendesk==4.6.0 -apache-beam==2.55.0 +apache-beam==2.55.1 apispec==6.6.0 apprise==1.7.5 argcomplete==3.2.3 @@ -289,7 +289,7 @@ entrypoints==0.4 eralchemy2==1.3.8 et-xmlfile==1.1.0 eventlet==0.36.1 -execnet==2.1.0 +execnet==2.1.1 executing==2.0.1 facebook_business==19.0.2 fastavro==1.9.4 @@ -314,7 +314,7 @@ google-api-python-client==2.125.0
(airflow-site) branch main updated: Fix typo in 2.9 blog post (#1000)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/main by this push: new 265ea2fea6 Fix typo in 2.9 blog post (#1000) 265ea2fea6 is described below commit 265ea2fea662e28996fbbf8eb24b15a64c4773bb Author: Jed Cunningham <66968678+jedcunning...@users.noreply.github.com> AuthorDate: Mon Apr 8 17:34:56 2024 -0400 Fix typo in 2.9 blog post (#1000) --- .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes .../site/content/en/blog/airflow-2.9.0/index.md| 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png index 0b209264c3..80b1f063e8 100644 Binary files a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png and b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png differ diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md index b8217f5d5c..9a502cfc8f 100644 --- a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md +++ b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md @@ -154,7 +154,7 @@ def big_hello(): greeting = "" for c in "Hello Airflow 2.9": greeting += c -print(f"Adding {c} to out greeting. Current greeting: {greeting}") +print(f"Adding {c} to our greeting. Current greeting: {greeting}") print("::endgroup::") print(greeting) ```
Re: [PR] Fix typo in 2.9 blog post [airflow-site]
jedcunningham merged PR #1000: URL: https://github.com/apache/airflow-site/pull/1000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow-site) branch 2.9.0-blog-post-typo deleted (was d2db4716ad)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a change to branch 2.9.0-blog-post-typo in repository https://gitbox.apache.org/repos/asf/airflow-site.git was d2db4716ad Fix typo in 2.9 blog post The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
Re: [PR] Make _run_task_by_local_task_job compatible with internal API [airflow]
dstandish commented on PR #38563: URL: https://github.com/apache/airflow/pull/38563#issuecomment-2043674071 ok renaming this one to `Add session that blows up when using internal API` because, with the addition of this, there's no longer a need to make the other changes I think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Introduce new config variable to control whether DAG processor outputs to stdout [airflow]
dimon222 commented on PR #37439: URL: https://github.com/apache/airflow/pull/37439#issuecomment-2043662173 How to select output ONLY to stdout? If I write to both locations I risk getting OOM in container environment and I don't want to persist it perpetually in PVC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow-site) branch gh-pages updated (12ec5916dd -> 613224189f)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch gh-pages in repository https://gitbox.apache.org/repos/asf/airflow-site.git discard 12ec5916dd Rewritten history to remove past gh-pages deployments new 613224189f Rewritten history to remove past gh-pages deployments This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (12ec5916dd) \ N -- N -- N refs/heads/gh-pages (613224189f) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: blog/airflow-1.10.10/index.html| 4 +- blog/airflow-1.10.12/index.html| 4 +- blog/airflow-1.10.8-1.10.9/index.html | 4 +- blog/airflow-2.2.0/index.html | 4 +- blog/airflow-2.3.0/index.html | 4 +- blog/airflow-2.4.0/index.html | 4 +- blog/airflow-2.5.0/index.html | 4 +- blog/airflow-2.6.0/index.html | 4 +- blog/airflow-2.7.0/index.html | 4 +- blog/airflow-2.8.0/index.html | 4 +- blog/airflow-2.9.0/index.html | 4 +- blog/airflow-survey-2020/index.html| 4 +- blog/airflow-survey-2022/index.html| 4 +- blog/airflow-survey/index.html | 4 +- blog/airflow-two-point-oh-is-here/index.html | 4 +- blog/airflow_summit_2021/index.html| 4 +- blog/airflow_summit_2022/index.html| 4 +- blog/announcing-new-website/index.html | 4 +- blog/apache-airflow-for-newcomers/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- blog/fab-oid-vulnerability/index.html | 4 +- .../index.html | 4 +- blog/introducing_setup_teardown/index.html | 4 +- .../index.html | 4 +- community/index.html | 30 + search/index.html | 4 +- sitemap.xml| 136 ++--- use-cases/adobe/index.html | 4 +- use-cases/adyen/index.html | 4 +- use-cases/big-fish-games/index.html| 4 +- use-cases/business_operations/index.html | 4 +- use-cases/dish/index.html | 4 +- use-cases/etl_analytics/index.html | 4 +- use-cases/experity/index.html | 4 +- use-cases/infrastructure-management/index.html | 4 +- use-cases/mlops/index.html | 4 +- use-cases/onefootball/index.html | 4 +- use-cases/plarium-krasnodar/index.html | 4 +- use-cases/seniorlink/index.html| 4 +- use-cases/sift/index.html | 4 +- use-cases/snapp/index.html | 4 +- use-cases/suse/index.html | 4 +- 45 files changed, 184 insertions(+), 154 deletions(-)
Re: [PR] fix: try002 for provider imap [airflow]
Taragolis commented on code in PR #38804: URL: https://github.com/apache/airflow/pull/38804#discussion_r1556428173 ## airflow/providers/imap/hooks/imap.py: ## @@ -219,7 +219,7 @@ def _retrieve_mails_attachments_by_name( self, name: str, check_regex: bool, latest_only: bool, mail_folder: str, mail_filter: str ) -> list: if not self.mail_client: -raise Exception("The 'mail_client' should be initialized before!") +raise AirflowException("The 'mail_client' should be initialized before!") Review Comment: ```suggestion raise RuntimeError("The 'mail_client' should be initialized before!") ``` I guess RuntimeError would be fine in this case, when we expects that some of the values are initialised before -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated: fix: try002 for provider common sql (#38800)
This is an automated email from the ASF dual-hosted git repository. taragolis pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 94153d70ac fix: try002 for provider common sql (#38800) 94153d70ac is described below commit 94153d70ac894d7c5249d183304646995d5df3e4 Author: Sebastian Daum AuthorDate: Mon Apr 8 23:00:12 2024 +0200 fix: try002 for provider common sql (#38800) --- airflow/providers/common/sql/hooks/sql.py | 10 +++--- pyproject.toml| 2 -- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/airflow/providers/common/sql/hooks/sql.py b/airflow/providers/common/sql/hooks/sql.py index 3f324e4f69..7f1536a39b 100644 --- a/airflow/providers/common/sql/hooks/sql.py +++ b/airflow/providers/common/sql/hooks/sql.py @@ -40,7 +40,11 @@ import sqlparse from more_itertools import chunked from sqlalchemy import create_engine -from airflow.exceptions import AirflowException, AirflowProviderDeprecationWarning +from airflow.exceptions import ( +AirflowException, +AirflowOptionalProviderFeatureException, +AirflowProviderDeprecationWarning, +) from airflow.hooks.base import BaseHook if TYPE_CHECKING: @@ -230,7 +234,7 @@ class DbApiHook(BaseHook): try: from pandas.io import sql as psql except ImportError: -raise Exception( +raise AirflowOptionalProviderFeatureException( "pandas library not installed, run: pip install " "'apache-airflow-providers-common-sql[pandas]'." ) @@ -257,7 +261,7 @@ class DbApiHook(BaseHook): try: from pandas.io import sql as psql except ImportError: -raise Exception( +raise AirflowOptionalProviderFeatureException( "pandas library not installed, run: pip install " "'apache-airflow-providers-common-sql[pandas]'." ) diff --git a/pyproject.toml b/pyproject.toml index bac8b3126c..58d1b31c4b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -381,8 +381,6 @@ combine-as-imports = true # All the providers modules which do not follow TRY002 yet # cncf.kubernetes "airflow/providers/cncf/kubernetes/operators/pod.py" = ["TRY002"] -# common.sql -"airflow/providers/common/sql/hooks/sql.py" = ["TRY002"] # google "airflow/providers/google/cloud/hooks/bigquery.py" = ["TRY002"] "airflow/providers/google/cloud/hooks/dataflow.py" = ["TRY002"]
Re: [PR] fix: try002 for provider common sql [airflow]
Taragolis merged PR #38800: URL: https://github.com/apache/airflow/pull/38800 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]
Taragolis commented on issue #38848: URL: https://github.com/apache/airflow/issues/38848#issuecomment-2043607725 ``render_template_as_native_obj=True,`` will force to use `jinja2.nativetypes.NativeEnvironment` which are enforced to convert some to python types, see example: ```python from jinja2.sandbox import SandboxedEnvironment from jinja2.nativetypes import NativeEnvironment environments = {"sandboxed": SandboxedEnvironment(), "native": NativeEnvironment()} template = "4.0" for name, env in environments.items(): print(f" Jinja2 Environment: {name} ".center(72, "=")) rendered = env.from_string("4.0").render() print(f"Template: {template!r}") print(f"Rendered: {rendered!r}, Type: {type(rendered).__name__!r}") ``` ```console Jinja2 Environment: sandboxed = Template: '4.0' Rendered: '4.0', Type: 'str' == Jinja2 Environment: native == Template: '4.0' Rendered: 4.0, Type: 'float' ``` There is two options here. **Option 1**: Do not use `jinja2.nativetypes.NativeEnvironment`, e.g. `render_template_as_native_obj=True` **Option 2** (Airflow 2.8+): Use `airflow.template.templater.LiteralValue` (see: https://github.com/apache/airflow/pull/35017) wrapper, which prevent template values, e.g. ```diff +from airflow.template.templater import LiteralValue ... create_job_kwargs={ 'WorkerType': 'G.2X', 'NumberOfWorkers': '1', 'DefaultArguments': { '--datalake-formats': 'iceberg', '--additional-python-modules': 'importlib', }, - 'GlueVersion': '4.0' + 'GlueVersion': LiteralValue('4.0') }, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Fix typo in 2.9 blog post [airflow-site]
jedcunningham opened a new pull request, #1000: URL: https://github.com/apache/airflow-site/pull/1000 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow-site) 01/01: Fix typo in 2.9 blog post
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a commit to branch 2.9.0-blog-post-typo in repository https://gitbox.apache.org/repos/asf/airflow-site.git commit d2db4716ad049c85cd63c460e2e57cddfb612a02 Author: Jed Cunningham AuthorDate: Mon Apr 8 15:25:15 2024 -0500 Fix typo in 2.9 blog post --- .../airflow-2.9.0/custom-log-grouping-expanded.png | Bin 221429 -> 222171 bytes .../site/content/en/blog/airflow-2.9.0/index.md| 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png index 0b209264c3..80b1f063e8 100644 Binary files a/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png and b/landing-pages/site/content/en/blog/airflow-2.9.0/custom-log-grouping-expanded.png differ diff --git a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md index b8217f5d5c..9a502cfc8f 100644 --- a/landing-pages/site/content/en/blog/airflow-2.9.0/index.md +++ b/landing-pages/site/content/en/blog/airflow-2.9.0/index.md @@ -154,7 +154,7 @@ def big_hello(): greeting = "" for c in "Hello Airflow 2.9": greeting += c -print(f"Adding {c} to out greeting. Current greeting: {greeting}") +print(f"Adding {c} to our greeting. Current greeting: {greeting}") print("::endgroup::") print(greeting) ```
(airflow-site) branch 2.9.0-blog-post-typo created (now d2db4716ad)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a change to branch 2.9.0-blog-post-typo in repository https://gitbox.apache.org/repos/asf/airflow-site.git at d2db4716ad Fix typo in 2.9 blog post This branch includes the following new commits: new d2db4716ad Fix typo in 2.9 blog post The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
(airflow) branch main updated: Amazon Bedrock - Model Customization Jobs (#38693)
This is an automated email from the ASF dual-hosted git repository. ferruzzi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 7ed31d5fdf Amazon Bedrock - Model Customization Jobs (#38693) 7ed31d5fdf is described below commit 7ed31d5fdf510e00528522ea313a20b19e498522 Author: D. Ferruzzi AuthorDate: Mon Apr 8 13:22:16 2024 -0700 Amazon Bedrock - Model Customization Jobs (#38693) * Amazon Bedrock - Customize Model Operator/Sensor/Waiter/Trigger --- airflow/providers/amazon/aws/hooks/bedrock.py | 20 +++ airflow/providers/amazon/aws/operators/bedrock.py | 161 - airflow/providers/amazon/aws/sensors/bedrock.py| 110 ++ airflow/providers/amazon/aws/triggers/bedrock.py | 61 airflow/providers/amazon/aws/waiters/bedrock.json | 42 ++ airflow/providers/amazon/provider.yaml | 6 + .../operators/bedrock.rst | 38 + tests/providers/amazon/aws/hooks/test_bedrock.py | 36 - .../providers/amazon/aws/operators/test_bedrock.py | 161 ++--- tests/providers/amazon/aws/sensors/test_bedrock.py | 95 .../providers/amazon/aws/triggers/test_bedrock.py | 53 +++ tests/providers/amazon/aws/waiters/test_bedrock.py | 70 + .../system/providers/amazon/aws/example_bedrock.py | 106 +- 13 files changed, 929 insertions(+), 30 deletions(-) diff --git a/airflow/providers/amazon/aws/hooks/bedrock.py b/airflow/providers/amazon/aws/hooks/bedrock.py index 11bacd9414..96636eb952 100644 --- a/airflow/providers/amazon/aws/hooks/bedrock.py +++ b/airflow/providers/amazon/aws/hooks/bedrock.py @@ -19,6 +19,26 @@ from __future__ import annotations from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook +class BedrockHook(AwsBaseHook): +""" +Interact with Amazon Bedrock. + +Provide thin wrapper around :external+boto3:py:class:`boto3.client("bedrock") `. + +Additional arguments (such as ``aws_conn_id``) may be specified and +are passed down to the underlying AwsBaseHook. + +.. seealso:: +- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook` +""" + +client_type = "bedrock" + +def __init__(self, *args, **kwargs) -> None: +kwargs["client_type"] = self.client_type +super().__init__(*args, **kwargs) + + class BedrockRuntimeHook(AwsBaseHook): """ Interact with the Amazon Bedrock Runtime. diff --git a/airflow/providers/amazon/aws/operators/bedrock.py b/airflow/providers/amazon/aws/operators/bedrock.py index d8eaf9e5d3..ee34a9aef7 100644 --- a/airflow/providers/amazon/aws/operators/bedrock.py +++ b/airflow/providers/amazon/aws/operators/bedrock.py @@ -19,10 +19,17 @@ from __future__ import annotations import json from typing import TYPE_CHECKING, Any, Sequence -from airflow.providers.amazon.aws.hooks.bedrock import BedrockRuntimeHook +from botocore.exceptions import ClientError + +from airflow.configuration import conf +from airflow.exceptions import AirflowException +from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook, BedrockRuntimeHook from airflow.providers.amazon.aws.operators.base_aws import AwsBaseOperator +from airflow.providers.amazon.aws.triggers.bedrock import BedrockCustomizeModelCompletedTrigger +from airflow.providers.amazon.aws.utils import validate_execute_complete_event from airflow.providers.amazon.aws.utils.mixins import aws_template_fields from airflow.utils.helpers import prune_dict +from airflow.utils.timezone import utcnow if TYPE_CHECKING: from airflow.utils.context import Context @@ -91,3 +98,155 @@ class BedrockInvokeModelOperator(AwsBaseOperator[BedrockRuntimeHook]): self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data) self.log.info("Bedrock model response: %s", response_body) return response_body + + +class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]): +""" +Create a fine-tuning job to customize a base model. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:BedrockCustomizeModelOperator` + +:param job_name: A unique name for the fine-tuning job. +:param custom_model_name: A name for the custom model being created. +:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon Bedrock can assume +to perform tasks on your behalf. +:param base_model_id: Name of the base model. +:param training_data_uri: The S3 URI where the training data is stored. +:param output_data_uri: The S3 URI where the output data is stored. +:param hyperparameters: Parameters related to tuning the model. +:param ensure_unique_job_name: If set to true, operator will check whether a model customization +job
(airflow-site) branch add-weilee-to-committer-list deleted (was 43cd4e89d8)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a change to branch add-weilee-to-committer-list in repository https://gitbox.apache.org/repos/asf/airflow-site.git was 43cd4e89d8 add Wei Lee to committer list The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
(airflow-site) branch main updated: add Wei Lee to committer list (#996)
This is an automated email from the ASF dual-hosted git repository. jedcunningham pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow-site.git The following commit(s) were added to refs/heads/main by this push: new e1c24fbde7 add Wei Lee to committer list (#996) e1c24fbde7 is described below commit e1c24fbde7eaa8157a3baa207ca271fb9f82b2b7 Author: Wei Lee AuthorDate: Tue Apr 9 04:22:57 2024 +0800 add Wei Lee to committer list (#996) --- landing-pages/site/data/committers.json | 6 ++ 1 file changed, 6 insertions(+) diff --git a/landing-pages/site/data/committers.json b/landing-pages/site/data/committers.json index 71e5b6a188..5242472022 100644 --- a/landing-pages/site/data/committers.json +++ b/landing-pages/site/data/committers.json @@ -161,6 +161,12 @@ "image": "https://github.com/vincbeck.png;, "nick": "vincbeck" }, + { +"name": "Wei Lee", +"github": "https://github.com/Lee-W;, +"image": "https://github.com/Lee-W.png;, +"nick": "Lee-W" + }, { "name": "Xinbin Huang", "github": "https://github.com/xinbinhuang;,
Re: [PR] add Wei Lee to committer list [airflow-site]
jedcunningham merged PR #996: URL: https://github.com/apache/airflow-site/pull/996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi merged PR #38693: URL: https://github.com/apache/airflow/pull/38693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Fix eks.py SyntaxWarning: invalid esape sequence '\s' [airflow]
vincbeck commented on PR #38734: URL: https://github.com/apache/airflow/pull/38734#issuecomment-2043541309 Hi @dabla, it seems this PR introduced a bug. All our system tests related to AWS EKS started to fail for the same reason. See error below: ``` ERROR [root] exec: failed to decode process output: Expecting value: line 1 column 41 (char 40) INFO [airflow.models.taskinstance] ::group::Post task execution logs ERROR [airflow.task] Task failed with exception Traceback (most recent call last): File "/opt/airflow/airflow/models/taskinstance.py", line 478, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) File "/opt/airflow/airflow/models/taskinstance.py", line 441, in _execute_callable return ExecutionCallableRunner( File "/opt/airflow/airflow/utils/operator_helpers.py", line 250, in run return self.func(*args, **kwargs) File "/opt/airflow/airflow/models/baseoperator.py", line 405, in wrapper return func(self, *args, **kwargs) File "/opt/airflow/airflow/providers/amazon/aws/operators/eks.py", line 1103, in execute return super().execute(context) File "/opt/airflow/airflow/models/baseoperator.py", line 405, in wrapper return func(self, *args, **kwargs) File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", line 578, in execute return self.execute_sync(context) File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", line 586, in execute_sync self.pod = self.get_or_create_pod( # must set `self.pod` for `on_kill` File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", line 542, in get_or_create_pod pod = self.find_pod(self.namespace or pod_request_obj.metadata.namespace, context=context) File "/opt/airflow/airflow/providers/cncf/kubernetes/operators/pod.py", line 524, in find_pod pod_list = self.client.list_namespaced_pod( File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15823, in list_namespaced_pod return self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501 File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15942, in list_namespaced_pod_with_http_info return self.api_client.call_api( File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api return self.__call_api(resource_path, method, File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api response_data = self.request( File "/usr/local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 373, in request return self.rest_client.GET(url, File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 244, in GET return self.request("GET", url, File "/usr/local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 238, in request raise ApiException(http_resp=r) kubernetes.client.exceptions.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Audit-Id': 'dede46b8-8f1d-496f-b005-5d49cf69f3a8', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '70b99781-63dc-4f6a-8e55-475c91222bbc', 'X-Kubernetes-Pf-Prioritylevel-Uid': '64a41238-df95-49a3-b050-f3a551eb9d19', 'Date': 'Mon, 08 Apr 2024 18:12:50 GMT', 'Content-Length': '261'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:anonymous\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403} ``` I am afraid the bug impacts not only AWS system tests but also users. I will create a PR to revert this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Automate links to configuration of providers [airflow]
poorvirohidekar commented on issue #34306: URL: https://github.com/apache/airflow/issues/34306#issuecomment-2043511356 @potiuk - From the file mentioned, I understand that class AuthConfigurations is rendering a template configurations.rst.jinja2 and when I look at the configurations.rst.jinja2 file (https://github.com/apache/airflow/blob/main/docs/exts/templates/configuration.rst.jinja2), there is a way in which a similar pattern of link seems to be getting generated? I am not sure if this AuthConfigurations is the relevant existing directive for this issue. If not, then can you help me understand how the links currently being generated or point me in the right direction? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] GlueJobOperator failing with Invalid type for parameter GlueVersion for provider version >= 7.1.0. continution of #32404 [airflow]
don1uppa opened a new issue, #38848: URL: https://github.com/apache/airflow/issues/38848 ### Apache Airflow version Other Airflow 2 version (please specify below) ### If "Other Airflow 2 version" selected, which one? Version: v2.8.1 current mwaa release ### What happened? [2024-04-08, 18:58:44 UTC] {{glue.py:189}} ERROR - Failed to run aws glue job, error: Parameter validation failed: Invalid type for parameter GlueVersion, value: 4.0, type: , valid types: Here is the full stack trace: ip-10-0-20-19.ec2.internal *** Reading remote log from Cloudwatch log_group: airflow-test-mwaa-Task log_stream: dag_id=glue_operator_failure/run_id=manual__2024-04-08T18_56_04.206370+00_00/task_id=submit_glue_job/attempt=2.log. [2024-04-08, 18:58:43 UTC] {{taskinstance.py:1956}} INFO - Dependencies all met for dep_context=non-requeueable deps ti= [2024-04-08, 18:58:43 UTC] {{taskinstance.py:1956}} INFO - Dependencies all met for dep_context=requeueable deps ti= [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2170}} INFO - Starting attempt 2 of 2 [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2191}} INFO - Executing on 2024-04-08 18:56:04.206370+00:00 [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:60}} INFO - Started process 18070 to run task [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:87}} INFO - Running: ['airflow', 'tasks', 'run', 'glue_operator_failure', 'submit_glue_job', 'manual__2024-04-08T18:56:04.206370+00:00', '--job-id', '5118', '--raw', '--subdir', 'DAGS_FOLDER/test_dags/glue_operator_failure.py', '--cfg-path', '/tmp/tmpbdmm1cz0'] [2024-04-08, 18:58:43 UTC] {{standard_task_runner.py:88}} INFO - Job 5118: Subtask submit_glue_job [2024-04-08, 18:58:43 UTC] {{task_command.py:423}} INFO - Running on host ip-10-0-20-19.ec2.internal [2024-04-08, 18:58:43 UTC] {{taskinstance.py:2480}} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='glue_operator_failure' AIRFLOW_CTX_TASK_ID='submit_glue_job' AIRFLOW_CTX_EXECUTION_DATE='2024-04-08T18:56:04.206370+00:00' AIRFLOW_CTX_TRY_NUMBER='2' AIRFLOW_CTX_DAG_RUN_ID='manual__2024-04-08T18:56:04.206370+00:00' [2024-04-08, 18:58:43 UTC] {{glue.py:172}} INFO - Initializing AWS Glue Job: simple ls. Wait for completion: True [2024-04-08, 18:58:43 UTC] {{base.py:83}} INFO - Using connection ID 'aws_default' for task execution. [2024-04-08, 18:58:43 UTC] {{glue.py:161}} INFO - Iam Role Name: test-glue-jobs [2024-04-08, 18:58:43 UTC] {{glue.py:356}} INFO - Checking if job already exists: simple ls [2024-04-08, 18:58:44 UTC] {{glue.py:421}} INFO - Creating job: simple ls [2024-04-08, 18:58:44 UTC] {{glue.py:189}} ERROR - Failed to run aws glue job, error: Parameter validation failed: Invalid type for parameter GlueVersion, value: 4.0, type: , valid types: [2024-04-08, 18:58:44 UTC] {{taskinstance.py:2698}} ERROR - Task failed with exception Traceback (most recent call last): File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task result = execute_callable(context=context, **execute_callable_kwargs) File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 177, in execute glue_job_run = self.glue_job_hook.initialize_job(self.script_args, self.run_job_kwargs) File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 183, in initialize_job job_name = self.create_or_update_glue_job() File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 422, in create_or_update_glue_job self.conn.create_job(**config) File "/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 553, in _api_call return self._make_api_call(operation_name, kwargs) ^^^ File "/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 962, in _make_api_call request_dict = self._convert_to_request_dict( ^^ File "/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 1036, in _convert_to_request_dict request_dict = self._serializer.serialize_to_request( ^^ File "/usr/local/airflow/.local/lib/python3.11/site-packages/botocore/validate.py", line 381, in serialize_to_request raise ParamValidationError(report=report.generate_report())
(airflow) branch main updated: Airflow 2.9.0 has been released (#38837)
This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 14a2b2cf98 Airflow 2.9.0 has been released (#38837) 14a2b2cf98 is described below commit 14a2b2cf98237658c500f3005c30c657c273d513 Author: Ephraim Anierobi AuthorDate: Mon Apr 8 20:01:31 2024 +0100 Airflow 2.9.0 has been released (#38837) --- .github/ISSUE_TEMPLATE/airflow_bug_report.yml | 3 +- Dockerfile | 2 +- README.md | 10 +- RELEASE_NOTES.rst | 295 + airflow/reproducible_build.yaml| 4 +- .../installation/supported-versions.rst| 2 +- generated/PYPI_README.md | 8 +- newsfragments/36376.significant.rst| 18 -- newsfragments/36514.significant.rst| 13 - newsfragments/37005.significant.rst| 10 - newsfragments/37176.significant.rst| 1 - newsfragments/37627.significant.rst| 1 - newsfragments/37734.significant.rst| 3 - newsfragments/37915.significant.rst| 1 - newsfragments/37925.bugfix | 1 - newsfragments/37988.significant.rst| 1 - newsfragments/38025.significant.rst| 19 -- newsfragments/38094.significant.rst| 5 - newsfragments/38401.significant.rst| 5 - scripts/ci/pre_commit/supported_versions.py| 2 +- 20 files changed, 310 insertions(+), 94 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/airflow_bug_report.yml b/.github/ISSUE_TEMPLATE/airflow_bug_report.yml index 0b9ed94944..47e3793f27 100644 --- a/.github/ISSUE_TEMPLATE/airflow_bug_report.yml +++ b/.github/ISSUE_TEMPLATE/airflow_bug_report.yml @@ -25,8 +25,7 @@ body: the latest release or main to see if the issue is fixed before reporting it. multiple: false options: -- "2.9.0b2" -- "2.8.4" +- "2.9.0" - "main (development)" - "Other Airflow 2 version (please specify below)" validations: diff --git a/Dockerfile b/Dockerfile index 292a378432..94798fbdb7 100644 --- a/Dockerfile +++ b/Dockerfile @@ -45,7 +45,7 @@ ARG AIRFLOW_UID="5" ARG AIRFLOW_USER_HOME_DIR=/home/airflow # latest released version here -ARG AIRFLOW_VERSION="2.8.4" +ARG AIRFLOW_VERSION="2.9.0" ARG PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" diff --git a/README.md b/README.md index 1f618e51fa..107b0ae81d 100644 --- a/README.md +++ b/README.md @@ -98,7 +98,7 @@ Airflow is not a streaming solution, but it is often used to process real-time d Apache Airflow is tested with: -| | Main version (dev) | Stable version (2.8.4) | +| | Main version (dev) | Stable version (2.9.0) | |-||-| | Python | 3.8, 3.9, 3.10, 3.11, 3.12 | 3.8, 3.9, 3.10, 3.11| | Platform| AMD64/ARM64(\*)| AMD64/ARM64(\*) | @@ -180,15 +180,15 @@ them to the appropriate format and workflow that your tool requires. ```bash -pip install 'apache-airflow==2.8.4' \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.4/constraints-3.8.txt; +pip install 'apache-airflow==2.9.0' \ + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.8.txt; ``` 2. Installing with extras (i.e., postgres, google) ```bash pip install 'apache-airflow[postgres,google]==2.8.3' \ - --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.4/constraints-3.8.txt; + --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.0/constraints-3.8.txt; ``` For information on installing provider packages, check @@ -293,7 +293,7 @@ Apache Airflow version life cycle: | Version | Current Patch/Minor | State | First Release | Limited Support | EOL/Terminated | |---|---|---|-|---|--| -| 2 | 2.8.4 | Supported | Dec 17, 2020| TBD | TBD | +| 2 | 2.9.0 | Supported | Dec 17, 2020| TBD | TBD | | 1.10 | 1.10.15 | EOL | Aug 27, 2018| Dec 17, 2020 | June 17, 2021| | 1.9 | 1.9.0 | EOL | Jan 03, 2018| Aug 27, 2018 | Aug 27, 2018 | | 1.8 | 1.8.2 | EOL | Mar 19, 2017| Jan 03, 2018 | Jan 03, 2018 | diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst index
Re: [PR] Airflow 2.9.0 has been released [airflow]
ephraimbuddy merged PR #38837: URL: https://github.com/apache/airflow/pull/38837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated: Chart: Default Airflow version to 2.9.0 (#38839)
This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 1d70bbf683 Chart: Default Airflow version to 2.9.0 (#38839) 1d70bbf683 is described below commit 1d70bbf68336476605a47651da20f7ca6ceae542 Author: Ephraim Anierobi AuthorDate: Mon Apr 8 19:32:07 2024 +0100 Chart: Default Airflow version to 2.9.0 (#38839) * Chart: Default Airflow version to 2.9.0 * fixup! Chart: Default Airflow version to 2.9.0 --- chart/Chart.yaml | 20 ++-- chart/newsfragments/38478.significant.rst | 3 --- chart/newsfragments/38839.significant.rst | 3 +++ chart/values.schema.json | 4 ++-- chart/values.yaml | 4 ++-- 5 files changed, 17 insertions(+), 17 deletions(-) diff --git a/chart/Chart.yaml b/chart/Chart.yaml index 659a36a628..29ee9ac5b7 100644 --- a/chart/Chart.yaml +++ b/chart/Chart.yaml @@ -20,7 +20,7 @@ apiVersion: v2 name: airflow version: 1.14.0 -appVersion: 2.8.4 +appVersion: 2.9.0 description: The official Helm chart to deploy Apache Airflow, a platform to programmatically author, schedule, and monitor workflows home: https://airflow.apache.org/ @@ -47,23 +47,23 @@ annotations: url: https://airflow.apache.org/docs/helm-chart/1.14.0/ artifacthub.io/screenshots: | - title: DAGs View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/dags.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/dags.png - title: Datasets View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/datasets.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/datasets.png - title: Grid View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/grid.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/grid.png - title: Graph View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/graph.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/graph.png - title: Calendar View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/calendar.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/calendar.png - title: Variable View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/variable_hidden.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/variable_hidden.png - title: Gantt Chart - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/gantt.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/gantt.png - title: Task Duration - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/duration.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/duration.png - title: Code View - url: https://airflow.apache.org/docs/apache-airflow/2.8.4/_images/code.png + url: https://airflow.apache.org/docs/apache-airflow/2.9.0/_images/code.png artifacthub.io/changes: | - description: Don't overwrite ``.Values.airflowPodAnnotations`` kind: fixed diff --git a/chart/newsfragments/38478.significant.rst b/chart/newsfragments/38478.significant.rst deleted file mode 100644 index 9200834d00..00 --- a/chart/newsfragments/38478.significant.rst +++ /dev/null @@ -1,3 +0,0 @@ -Default Airflow image is updated to ``2.8.4`` - -The default Airflow image that is used with the Chart is now ``2.8.4``, previously it was ``2.8.3``. diff --git a/chart/newsfragments/38839.significant.rst b/chart/newsfragments/38839.significant.rst new file mode 100644 index 00..8f913bf8bc --- /dev/null +++ b/chart/newsfragments/38839.significant.rst @@ -0,0 +1,3 @@ +Default Airflow image is updated to ``2.9.0`` + +The default Airflow image that is used with the Chart is now ``2.9.0``, previously it was ``2.8.3``. diff --git a/chart/values.schema.json b/chart/values.schema.json index 47b07c7d44..c4dbe47464 100644 --- a/chart/values.schema.json +++ b/chart/values.schema.json @@ -77,7 +77,7 @@ "defaultAirflowTag": { "description": "Default airflow tag to deploy.", "type": "string", -"default": "2.8.4", +"default": "2.9.0", "x-docsSection": "Common" }, "defaultAirflowDigest": { @@ -92,7 +92,7 @@ "airflowVersion": { "description": "Airflow version (Used to make some decisions based on Airflow Version being deployed).", "type": "string", -"default": "2.8.4", +"default": "2.9.0", "x-docsSection": "Common" }, "securityContext": { diff --git a/chart/values.yaml
Re: [PR] Chart: Default Airflow version to 2.9.0 [airflow]
ephraimbuddy merged PR #38839: URL: https://github.com/apache/airflow/pull/38839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Change the UI color for ExternalSensor tasks [airflow]
idantepper commented on issue #38845: URL: https://github.com/apache/airflow/issues/38845#issuecomment-2043379278 Hi, I would like to take this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556213468 ## airflow/providers/amazon/aws/operators/bedrock.py: ## @@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]: self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data) self.log.info("Bedrock model response: %s", response_body) return response_body + + +class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]): +""" +Create a fine-tuning job to customize a base model. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:BedrockCustomizeModelOperator` + +:param job_name: A unique name for the fine-tuning job. +:param custom_model_name: A name for the custom model being created. +:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon Bedrock can assume +to perform tasks on your behalf. +:param base_model_id: Name of the base model. +:param training_data_uri: The S3 URI where the training data is stored. +:param output_data_uri: The S3 URI where the output data is stored. +:param hyperparameters: Parameters related to tuning the model. +:param ensure_unique_job_name: If set to true, operator will check whether a model customization +job already exists for the name in the config and append the current timestamp if there is a +name conflict. (Default: True) +:param customization_job_kwargs: Any optional parameters to pass to the API. + +:param wait_for_completion: Whether to wait for cluster to stop. (default: True) +:param waiter_delay: Time in seconds to wait between status checks. (default: 120) +:param waiter_max_attempts: Maximum number of attempts to check for job completion. (default: 75) +:param deferrable: If True, the operator will wait asynchronously for the cluster to stop. +This implies waiting for completion. This mode requires aiobotocore module to be installed. +(default: False) +:param aws_conn_id: The Airflow connection used for AWS credentials. +If this is ``None`` or empty then the default boto3 behaviour is used. If +running Airflow in a distributed manner and aws_conn_id is None or +empty, then default boto3 configuration would be used (and must be +maintained on each worker node). +:param region_name: AWS region_name. If not specified then the default boto3 behaviour is used. +:param verify: Whether or not to verify SSL certificates. See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html +:param botocore_config: Configuration dictionary (key-values) for botocore client. See: + https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html +""" + +aws_hook_class = BedrockHook +template_fields: Sequence[str] = aws_template_fields( +"job_name", +"custom_model_name", +"role_arn", +"base_model_id", +"hyperparameters", +"ensure_unique_job_name", +"customization_job_kwargs", +) + +def __init__( +self, +job_name: str, +custom_model_name: str, +role_arn: str, +base_model_id: str, +training_data_uri: str, +output_data_uri: str, +hyperparameters: dict[str, str], +ensure_unique_job_name: bool = True, +customization_job_kwargs: dict[str, Any] | None = None, +wait_for_completion: bool = True, +waiter_delay: int = 120, +waiter_max_attempts: int = 75, +deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False), +**kwargs, +): +super().__init__(**kwargs) +self.wait_for_completion = wait_for_completion +self.waiter_delay = waiter_delay +self.waiter_max_attempts = waiter_max_attempts +self.deferrable = deferrable + +self.job_name = job_name +self.custom_model_name = custom_model_name +self.role_arn = role_arn +self.base_model_id = base_model_id +self.training_data_config = {"s3Uri": training_data_uri} +self.output_data_config = {"s3Uri": output_data_uri} +self.hyperparameters = hyperparameters +self.ensure_unique_job_name = ensure_unique_job_name +self.customization_job_kwargs = customization_job_kwargs or {} + +self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"} + +def execute_complete(self, context: Context, event: dict[str, Any] | None = None) -> str: +event = validate_execute_complete_event(event) + +if event["status"] != "success": +raise AirflowException(f"Error while running job: {event}") + +self.log.info("Bedrock model customization job `%s` complete.", self.job_name) +
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556206876 ## airflow/providers/amazon/aws/operators/bedrock.py: ## @@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]: self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data) self.log.info("Bedrock model response: %s", response_body) return response_body + + +class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]): +""" +Create a fine-tuning job to customize a base model. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:BedrockCustomizeModelOperator` + +:param job_name: A unique name for the fine-tuning job. +:param custom_model_name: A name for the custom model being created. +:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon Bedrock can assume +to perform tasks on your behalf. +:param base_model_id: Name of the base model. +:param training_data_uri: The S3 URI where the training data is stored. +:param output_data_uri: The S3 URI where the output data is stored. +:param hyperparameters: Parameters related to tuning the model. +:param ensure_unique_job_name: If set to true, operator will check whether a model customization +job already exists for the name in the config and append the current timestamp if there is a +name conflict. (Default: True) +:param customization_job_kwargs: Any optional parameters to pass to the API. + +:param wait_for_completion: Whether to wait for cluster to stop. (default: True) +:param waiter_delay: Time in seconds to wait between status checks. (default: 120) +:param waiter_max_attempts: Maximum number of attempts to check for job completion. (default: 75) +:param deferrable: If True, the operator will wait asynchronously for the cluster to stop. +This implies waiting for completion. This mode requires aiobotocore module to be installed. +(default: False) +:param aws_conn_id: The Airflow connection used for AWS credentials. +If this is ``None`` or empty then the default boto3 behaviour is used. If +running Airflow in a distributed manner and aws_conn_id is None or +empty, then default boto3 configuration would be used (and must be +maintained on each worker node). +:param region_name: AWS region_name. If not specified then the default boto3 behaviour is used. +:param verify: Whether or not to verify SSL certificates. See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html +:param botocore_config: Configuration dictionary (key-values) for botocore client. See: + https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html +""" + +aws_hook_class = BedrockHook +template_fields: Sequence[str] = aws_template_fields( +"job_name", +"custom_model_name", +"role_arn", +"base_model_id", +"hyperparameters", +"ensure_unique_job_name", +"customization_job_kwargs", +) + +def __init__( +self, +job_name: str, +custom_model_name: str, +role_arn: str, +base_model_id: str, +training_data_uri: str, +output_data_uri: str, +hyperparameters: dict[str, str], +ensure_unique_job_name: bool = True, +customization_job_kwargs: dict[str, Any] | None = None, +wait_for_completion: bool = True, +waiter_delay: int = 120, +waiter_max_attempts: int = 75, +deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False), +**kwargs, +): +super().__init__(**kwargs) +self.wait_for_completion = wait_for_completion +self.waiter_delay = waiter_delay +self.waiter_max_attempts = waiter_max_attempts +self.deferrable = deferrable + +self.job_name = job_name +self.custom_model_name = custom_model_name +self.role_arn = role_arn +self.base_model_id = base_model_id +self.training_data_config = {"s3Uri": training_data_uri} +self.output_data_config = {"s3Uri": output_data_uri} +self.hyperparameters = hyperparameters +self.ensure_unique_job_name = ensure_unique_job_name +self.customization_job_kwargs = customization_job_kwargs or {} + +self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"} + +def execute_complete(self, context: Context, event: dict[str, Any] | None = None) -> str: +event = validate_execute_complete_event(event) + +if event["status"] != "success": +raise AirflowException(f"Error while running job: {event}") + +self.log.info("Bedrock model customization job `%s` complete.", self.job_name) +
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556206876 ## airflow/providers/amazon/aws/operators/bedrock.py: ## @@ -91,3 +98,155 @@ def execute(self, context: Context) -> dict[str, str | int]: self.log.info("Bedrock %s prompt: %s", self.model_id, self.input_data) self.log.info("Bedrock model response: %s", response_body) return response_body + + +class BedrockCustomizeModelOperator(AwsBaseOperator[BedrockHook]): +""" +Create a fine-tuning job to customize a base model. + +.. seealso:: +For more information on how to use this operator, take a look at the guide: +:ref:`howto/operator:BedrockCustomizeModelOperator` + +:param job_name: A unique name for the fine-tuning job. +:param custom_model_name: A name for the custom model being created. +:param role_arn: The Amazon Resource Name (ARN) of an IAM role that Amazon Bedrock can assume +to perform tasks on your behalf. +:param base_model_id: Name of the base model. +:param training_data_uri: The S3 URI where the training data is stored. +:param output_data_uri: The S3 URI where the output data is stored. +:param hyperparameters: Parameters related to tuning the model. +:param ensure_unique_job_name: If set to true, operator will check whether a model customization +job already exists for the name in the config and append the current timestamp if there is a +name conflict. (Default: True) +:param customization_job_kwargs: Any optional parameters to pass to the API. + +:param wait_for_completion: Whether to wait for cluster to stop. (default: True) +:param waiter_delay: Time in seconds to wait between status checks. (default: 120) +:param waiter_max_attempts: Maximum number of attempts to check for job completion. (default: 75) +:param deferrable: If True, the operator will wait asynchronously for the cluster to stop. +This implies waiting for completion. This mode requires aiobotocore module to be installed. +(default: False) +:param aws_conn_id: The Airflow connection used for AWS credentials. +If this is ``None`` or empty then the default boto3 behaviour is used. If +running Airflow in a distributed manner and aws_conn_id is None or +empty, then default boto3 configuration would be used (and must be +maintained on each worker node). +:param region_name: AWS region_name. If not specified then the default boto3 behaviour is used. +:param verify: Whether or not to verify SSL certificates. See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html +:param botocore_config: Configuration dictionary (key-values) for botocore client. See: + https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html +""" + +aws_hook_class = BedrockHook +template_fields: Sequence[str] = aws_template_fields( +"job_name", +"custom_model_name", +"role_arn", +"base_model_id", +"hyperparameters", +"ensure_unique_job_name", +"customization_job_kwargs", +) + +def __init__( +self, +job_name: str, +custom_model_name: str, +role_arn: str, +base_model_id: str, +training_data_uri: str, +output_data_uri: str, +hyperparameters: dict[str, str], +ensure_unique_job_name: bool = True, +customization_job_kwargs: dict[str, Any] | None = None, +wait_for_completion: bool = True, +waiter_delay: int = 120, +waiter_max_attempts: int = 75, +deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False), +**kwargs, +): +super().__init__(**kwargs) +self.wait_for_completion = wait_for_completion +self.waiter_delay = waiter_delay +self.waiter_max_attempts = waiter_max_attempts +self.deferrable = deferrable + +self.job_name = job_name +self.custom_model_name = custom_model_name +self.role_arn = role_arn +self.base_model_id = base_model_id +self.training_data_config = {"s3Uri": training_data_uri} +self.output_data_config = {"s3Uri": output_data_uri} +self.hyperparameters = hyperparameters +self.ensure_unique_job_name = ensure_unique_job_name +self.customization_job_kwargs = customization_job_kwargs or {} + +self.valid_action_if_job_exists: set[str] = {"timestamp", "fail"} + +def execute_complete(self, context: Context, event: dict[str, Any] | None = None) -> str: +event = validate_execute_complete_event(event) + +if event["status"] != "success": +raise AirflowException(f"Error while running job: {event}") + +self.log.info("Bedrock model customization job `%s` complete.", self.job_name) +
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556199352 ## airflow/providers/amazon/aws/sensors/bedrock.py: ## @@ -0,0 +1,111 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from typing import TYPE_CHECKING, Any, Sequence + +from airflow.configuration import conf +from airflow.providers.amazon.aws.sensors.base_aws import AwsBaseSensor +from airflow.providers.amazon.aws.triggers.bedrock import BedrockCustomizeModelCompletedTrigger +from airflow.providers.amazon.aws.utils.mixins import aws_template_fields + +if TYPE_CHECKING: +from airflow.utils.context import Context + +from airflow.exceptions import AirflowException, AirflowSkipException +from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook + + +class BedrockCustomizeModelCompletedSensor(AwsBaseSensor[BedrockHook]): +""" +Poll the state of the model customization job until it reaches a terminal state; fails if the job fails. + +.. seealso:: +For more information on how to use this sensor, take a look at the guide: +:ref:`howto/sensor:BedrockCustomizeModelCompletedSensor` + + +:param job_name: The name of the Bedrock model customization job. + +:param deferrable: If True, the sensor will operate in deferrable mode. This mode requires aiobotocore +module to be installed. +(default: False, but can be overridden in config file by setting default_deferrable to True) +:param max_retries: Number of times before returning the current state. (default: 75) +:param poke_interval: Polling period in seconds to check for the status of the job. (default: 120) +:param aws_conn_id: The Airflow connection used for AWS credentials. +If this is ``None`` or empty then the default boto3 behaviour is used. If +running Airflow in a distributed manner and aws_conn_id is None or +empty, then default boto3 configuration would be used (and must be +maintained on each worker node). +:param region_name: AWS region_name. If not specified then the default boto3 behaviour is used. +:param verify: Whether or not to verify SSL certificates. See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html +:param botocore_config: Configuration dictionary (key-values) for botocore client. See: + https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html +""" + +INTERMEDIATE_STATES = ("InProgress",) +FAILURE_STATES = ("Failed", "Stopping", "Stopped") +SUCCESS_STATES = ("Completed",) +FAILURE_MESSAGE = "Bedrock model customization job sensor failed." + +aws_hook_class = BedrockHook +template_fields: Sequence[str] = aws_template_fields("job_name") +ui_color = "#66c3ff" + +def __init__( +self, +*, +job_name: str, +max_retries: int = 75, +poke_interval: int = 120, +deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False), +**kwargs: Any, +) -> None: +super().__init__(**kwargs) +self.job_name = job_name +self.poke_interval = poke_interval +self.max_retries = max_retries +self.deferrable = deferrable + +def execute(self, context: Context) -> Any: +if self.deferrable: +self.defer( +trigger=BedrockCustomizeModelCompletedTrigger( +job_name=self.job_name, +waiter_delay=int(self.poke_interval), +waiter_max_attempts=self.max_retries, +aws_conn_id=self.aws_conn_id, +), +method_name="poke", +) +else: +super().execute(context=context) + +def poke(self, context: Context) -> bool: +state = self.hook.get_customize_model_job_state(self.job_name) + +if state in self.FAILURE_STATES: +# TODO: remove this if block when min_airflow_version is set to higher than 2.7.1 +if self.soft_fail: +raise AirflowSkipException(self.FAILURE_MESSAGE) +raise
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
ferruzzi commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556195811 ## airflow/providers/amazon/aws/hooks/bedrock.py: ## @@ -19,6 +19,37 @@ from airflow.providers.amazon.aws.hooks.base_aws import AwsBaseHook +class BedrockHook(AwsBaseHook): +""" +Interact with Amazon Bedrock. + +Provide thin wrapper around :external+boto3:py:class:`boto3.client("bedrock") `. + +Additional arguments (such as ``aws_conn_id``) may be specified and +are passed down to the underlying AwsBaseHook. + +.. seealso:: +- :class:`airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook` +""" + +client_type = "bedrock" + +def __init__(self, *args, **kwargs) -> None: +kwargs["client_type"] = self.client_type +super().__init__(*args, **kwargs) + +def _get_job_by_name(self, job_name: str): +return self.conn.get_model_customization_job(jobIdentifier=job_name) + +def get_customize_model_job_state(self, job_name: str) -> str: +state = self._get_job_by_name(job_name)["status"] +self.log.info("Job '%s' state: %s", job_name, state) +return state + +def get_job_arn(self, job_name: str) -> str: +return self._get_job_by_name(job_name)["jobArn"] Review Comment: Yeah, they felt handy and made the Sensor unit tests a little easier to mock, but I'll drop them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Make _run_task_by_local_task_job compatible with internal API [airflow]
dstandish commented on code in PR #38563: URL: https://github.com/apache/airflow/pull/38563#discussion_r1556179515 ## airflow/cli/commands/task_command.py: ## @@ -278,7 +279,15 @@ def _run_task_by_local_task_job(args, ti: TaskInstance | TaskInstancePydantic) - external_executor_id=_extract_external_executor_id(args), ) try: -ret = run_job(job=job_runner.job, execute_callable=job_runner._execute) +# If internal API is used, we must pass session=None so that one is not created +# by the provide_session decorator. All downstream functions that actually use +# a session are already set up for RPC +# todo: perhaps instead we should simply modify the provide_session decorator +# to *not* provide a session when using internal API Review Comment: i'm working on something like this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Setting use_beeline by default for hive cli connection [airflow]
eladkal commented on code in PR #38763: URL: https://github.com/apache/airflow/pull/38763#discussion_r1556176719 ## airflow/providers/apache/hive/CHANGELOG.rst: ## @@ -27,6 +27,19 @@ Changelog - +8.0.0 +. + + +Breaking changes + + +Changed the default value of ``use_beeline`` in hive cli connection to True. +Beeline will be always enabled by default in this connection type from now onwards. + +* ``Setting use_beeline by default for hive cli connection (#38763)`` Review Comment: This will be set by generating docs automaticly ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated: Add lineage_job_namespace and lineage_job_name OpenLineage macros (#38829)
This is an automated email from the ASF dual-hosted git repository. mobuchowski pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 093ab7e755 Add lineage_job_namespace and lineage_job_name OpenLineage macros (#38829) 093ab7e755 is described below commit 093ab7e7556bad9202e83e9fd6d968c50a5f7cb8 Author: Maxim Martynov AuthorDate: Mon Apr 8 20:17:14 2024 +0300 Add lineage_job_namespace and lineage_job_name OpenLineage macros (#38829) --- airflow/providers/openlineage/plugins/macros.py| 43 .../providers/openlineage/plugins/openlineage.py | 9 +++- airflow/providers/openlineage/utils/utils.py | 2 +- .../macros.rst | 59 ++ tests/providers/openlineage/plugins/test_macros.py | 36 ++--- 5 files changed, 109 insertions(+), 40 deletions(-) diff --git a/airflow/providers/openlineage/plugins/macros.py b/airflow/providers/openlineage/plugins/macros.py index 391b29495f..ddfceb3459 100644 --- a/airflow/providers/openlineage/plugins/macros.py +++ b/airflow/providers/openlineage/plugins/macros.py @@ -26,22 +26,41 @@ if TYPE_CHECKING: from airflow.models import TaskInstance +def lineage_job_namespace(): +""" +Macro function which returns Airflow OpenLineage namespace. + +.. seealso:: +For more information take a look at the guide: +:ref:`howto/macros:openlineage` +""" +return conf.namespace() + + +def lineage_job_name(task_instance: TaskInstance): +""" +Macro function which returns Airflow task name in OpenLineage format (`.`). + +.. seealso:: +For more information take a look at the guide: +:ref:`howto/macros:openlineage` +""" +return get_job_name(task_instance) + + def lineage_run_id(task_instance: TaskInstance): """ -Macro function which returns the generated run id for a given task. +Macro function which returns the generated run id (UUID) for a given task. This can be used to forward the run id from a task to a child run so the job hierarchy is preserved. .. seealso:: -For more information on how to use this operator, take a look at the guide: +For more information take a look at the guide: :ref:`howto/macros:openlineage` """ -if TYPE_CHECKING: -assert task_instance.task - return OpenLineageAdapter.build_task_instance_run_id( dag_id=task_instance.dag_id, -task_id=task_instance.task.task_id, +task_id=task_instance.task_id, execution_date=task_instance.execution_date, try_number=task_instance.try_number, ) @@ -56,9 +75,13 @@ def lineage_parent_id(task_instance: TaskInstance): run so the job hierarchy is preserved. Child run can easily create ParentRunFacet from these information. .. seealso:: -For more information on how to use this macro, take a look at the guide: +For more information take a look at the guide: :ref:`howto/macros:openlineage` """ -job_name = get_job_name(task_instance.task) -run_id = lineage_run_id(task_instance) -return f"{conf.namespace()}/{job_name}/{run_id}" +return "/".join( +( +lineage_job_namespace(), +lineage_job_name(task_instance), +lineage_run_id(task_instance), +) +) diff --git a/airflow/providers/openlineage/plugins/openlineage.py b/airflow/providers/openlineage/plugins/openlineage.py index a0be47a499..5927929588 100644 --- a/airflow/providers/openlineage/plugins/openlineage.py +++ b/airflow/providers/openlineage/plugins/openlineage.py @@ -19,7 +19,12 @@ from __future__ import annotations from airflow.plugins_manager import AirflowPlugin from airflow.providers.openlineage import conf from airflow.providers.openlineage.plugins.listener import get_openlineage_listener -from airflow.providers.openlineage.plugins.macros import lineage_parent_id, lineage_run_id +from airflow.providers.openlineage.plugins.macros import ( +lineage_job_name, +lineage_job_namespace, +lineage_parent_id, +lineage_run_id, +) class OpenLineageProviderPlugin(AirflowPlugin): @@ -32,5 +37,5 @@ class OpenLineageProviderPlugin(AirflowPlugin): name = "OpenLineageProviderPlugin" if not conf.is_disabled(): -macros = [lineage_run_id, lineage_parent_id] +macros = [lineage_job_namespace, lineage_job_name, lineage_run_id, lineage_parent_id] listeners = [get_openlineage_listener()] diff --git a/airflow/providers/openlineage/utils/utils.py b/airflow/providers/openlineage/utils/utils.py index fb2263b90d..1c777aff76 100644 --- a/airflow/providers/openlineage/utils/utils.py +++ b/airflow/providers/openlineage/utils/utils.py @@ -56,7 +56,7 @@ def get_operator_class(task: BaseOperator) -> type: return task.__class__ -def
Re: [PR] Add lineage_job_namespace and lineage_job_name OpenLineage macros [airflow]
mobuchowski merged PR #38829: URL: https://github.com/apache/airflow/pull/38829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Setting use_beeline by default for hive cli connection [airflow]
potiuk commented on PR #38763: URL: https://github.com/apache/airflow/pull/38763#issuecomment-2043244944 Approved with NIT's -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Setting use_beeline by default for hive cli connection [airflow]
potiuk commented on code in PR #38763: URL: https://github.com/apache/airflow/pull/38763#discussion_r1556155471 ## airflow/providers/apache/hive/provider.yaml: ## @@ -25,6 +25,7 @@ state: ready source-date-epoch: 1709554960 # note that those versions are maintained by release manager - do not update them manually versions: + - 7.0.2 Review Comment: And spelling issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Setting use_beeline by default for hive cli connection [airflow]
potiuk commented on code in PR #38763: URL: https://github.com/apache/airflow/pull/38763#discussion_r1556154848 ## airflow/providers/apache/hive/provider.yaml: ## @@ -25,6 +25,7 @@ state: ready source-date-epoch: 1709554960 # note that those versions are maintained by release manager - do not update them manually versions: + - 7.0.2 Review Comment: ```suggestion - 8.0.0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated: Improve editable airflow installation by adding preinstalled deps (#38764)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new d409b8be96 Improve editable airflow installation by adding preinstalled deps (#38764) d409b8be96 is described below commit d409b8be966b51207e10129e7709e943a5dc7ac3 Author: Jarek Potiuk AuthorDate: Mon Apr 8 18:56:17 2024 +0200 Improve editable airflow installation by adding preinstalled deps (#38764) * Improve editable airflow installation by adding preinstalled deps While preparing for presentation about modernizing packaging setup I found out that we can slightly improve editable installation of airflow. So far we required a "devel" installation in order to make a working editable installation of airflow (because the devel extra was installing dependencies for all the pre-installed providers). However that required to synchronize the list of providers installed in `devel` dependency with the list of preinstalled providers, as well as it made `pip install -e .` resulting with a non-working version of Airflow (because it requires `fab` provider dependencies to start airflow webserver). This PR improves it - in editable mode, instead of adding the pre-installed providers, we add their dependencies. This way we can remove the pre-installed providers from the list of devel providers to install - because now the pre-installed provider dependencies are installed simply as "required" dependencies. As a result - simple `pip install -e .` should now result in fully working airflow installation - without all the devel goodies and without celery and kubernetes dependencies, but fully usable for sequential and local executor cases. Also reviewed and updated the comments in hatch_build.py to better reflect the purpose and behaviour of some of the methods there. * Update hatch_build.py Co-authored-by: Ephraim Anierobi - Co-authored-by: Ephraim Anierobi --- airflow_pre_installed_providers.txt | 9 --- hatch_build.py | 150 ++-- 2 files changed, 110 insertions(+), 49 deletions(-) diff --git a/airflow_pre_installed_providers.txt b/airflow_pre_installed_providers.txt deleted file mode 100644 index e717ea696e..00 --- a/airflow_pre_installed_providers.txt +++ /dev/null @@ -1,9 +0,0 @@ -# List of all the providers that are pre-installed when you run `pip install apache-airflow` without extras -common.io -common.sql -fab>=1.0.2rc1 -ftp -http -imap -smtp -sqlite diff --git a/hatch_build.py b/hatch_build.py index 69b6da7a1c..532ee7d3e3 100644 --- a/hatch_build.py +++ b/hatch_build.py @@ -273,8 +273,6 @@ DEVEL_EXTRAS: dict[str, list[str]] = { "devel": [ "apache-airflow[celery]", "apache-airflow[cncf-kubernetes]", -"apache-airflow[common-io]", -"apache-airflow[common-sql]", "apache-airflow[devel-debuggers]", "apache-airflow[devel-devscripts]", "apache-airflow[devel-duckdb]", @@ -282,11 +280,6 @@ DEVEL_EXTRAS: dict[str, list[str]] = { "apache-airflow[devel-sentry]", "apache-airflow[devel-static-checks]", "apache-airflow[devel-tests]", -"apache-airflow[fab]", -"apache-airflow[ftp]", -"apache-airflow[http]", -"apache-airflow[imap]", -"apache-airflow[sqlite]", ], "devel-all-dbs": [ "apache-airflow[apache-cassandra]", @@ -550,11 +543,35 @@ ALL_DYNAMIC_EXTRAS: list[str] = sorted( def get_provider_id(provider_spec: str) -> str: -# in case provider_spec is "=" -return provider_spec.split(">=")[0] +""" +Extract provider id from provider specification. + +:param provider_spec: provider specification can be in the form of the "PROVIDER_ID" or + "apache-airflow-providers-PROVIDER", optionally followed by ">=VERSION". + +:return: short provider_id with `.` instead of `-` in case of `apache` and other providers with + `-` in the name. +""" +_provider_id = provider_spec.split(">=")[0] +if _provider_id.startswith("apache-airflow-providers-"): +_provider_id = _provider_id.replace("apache-airflow-providers-", "").replace("-", ".") +return _provider_id def get_provider_requirement(provider_spec: str) -> str: +""" +Convert provider specification with provider_id to provider requirement. + +The requirement can be used when constructing dependencies. It automatically adds pre-release specifier +in case we are building pre-release version of Airflow. This way we can handle the case when airflow +depends on specific version of the provider that has not yet been released - then we release the +pre-release version of
Re: [PR] Improve editable airflow installation by adding preinstalled deps [airflow]
potiuk merged PR #38764: URL: https://github.com/apache/airflow/pull/38764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Give `on_task_instance_failed` access to the error that caused the failure [airflow]
potiuk commented on code in PR #38155: URL: https://github.com/apache/airflow/pull/38155#discussion_r1556148549 ## airflow/models/taskinstance.py: ## @@ -1409,7 +1410,9 @@ class TaskInstance(Base, LoggingMixin): cascade="all, delete, delete-orphan", ) note = association_proxy("task_instance_note", "content", creator=_creator_note) + task: Operator | None = None +_thread_local_data = threading.local() Review Comment: I just look at this more closely. Hmm. I don't think we shouldadd the thread local data to task instance - why would we need to do it? The whole point of thread local data - is that you can access is directly whithout setting a variable to the model object that is passed down the stack. What I **really** thought about is literally two methods in `taskinstance.py`: ```python def set_last_error(error: None | str | BaseException:): ``` ```python def get_last_error() -> None | str | BaseException: ``` Both using ThreadLocal. No passing extra field in TaskInstance. ## airflow/models/taskinstance.py: ## @@ -1409,7 +1410,9 @@ class TaskInstance(Base, LoggingMixin): cascade="all, delete, delete-orphan", ) note = association_proxy("task_instance_note", "content", creator=_creator_note) + task: Operator | None = None +_thread_local_data = threading.local() Review Comment: I just look at this more closely. Hmm. I don't think we should add the thread local data to task instance - why would we need to do it? The whole point of thread local data - is that you can access is directly whithout setting a variable to the model object that is passed down the stack. What I **really** thought about is literally two methods in `taskinstance.py`: ```python def set_last_error(error: None | str | BaseException:): ``` ```python def get_last_error() -> None | str | BaseException: ``` Both using ThreadLocal. No passing extra field in TaskInstance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Databricks provider does not support Azure managed identity if more than one identity exists (e.g. ADF, AKS) [airflow]
potiuk commented on issue #38762: URL: https://github.com/apache/airflow/issues/38762#issuecomment-2043197124 > Perhaps one of the project maintainers could kindly suggest it if they have open channels of communication with Microsoft about their commercial offerings? Not as far as I know. But we would love Microsoft team to contribute in this and other relevant features. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Replace dill package to use cloudpickle [airflow]
potiuk commented on code in PR #38531: URL: https://github.com/apache/airflow/pull/38531#discussion_r1556118149 ## airflow/models/taskinstance.py: ## @@ -1287,7 +1287,7 @@ class TaskInstance(Base, LoggingMixin): queued_dttm = Column(UtcDateTime) queued_by_job_id = Column(Integer) pid = Column(Integer) -executor_config = Column(ExecutorConfigType(pickler=dill)) +executor_config = Column(ExecutorConfigType(pickler=cloudpickle)) Review Comment: I believe - we cannot use pickle. There are certain executors (K8S executor is the main culprit) that can use configuration that is not picklable because it can use V1Pod and similar classes for custom pod template. See the example here where you provide custom Pod defintion via executor-config passed to the task. Such config is serialized when the DAG is parsed and stored in the DB, and deserialized by the executor when it attempts to execute it: https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/kubernetes_executor.html#pod-override It's likely however that we could swap the default for `cloudpickle` (or maybe even choose whichever pickling class is available and importable - we could detect it, but then there is an issue for conversion for past task in case of changeover - as @hussein-awala mentioned and implemented for encrypted kwargs: https://github.com/apache/airflow/pull/38358 I still think it's possibe to have the libraries as optional for core (and pre-install both in airflow reference image), but It needs a bit of deliberation and right errors/messaging when there are already task in the DB which have been serialized using one of them, but when the library is missing - basically handling the case where someone had tasks created in the DB using `dill` and starts airflow without `dill` - we should raise an error and tell the user to install `dill` to make the on-the-flight conversion of old values). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Databricks provider does not support Azure managed identity if more than one identity exists (e.g. ADF, AKS) [airflow]
jtv8 commented on issue #38762: URL: https://github.com/apache/airflow/issues/38762#issuecomment-2043178028 Hi @ghjklw! I agree that it would be better to leverage the abstractions in the official Microsoft libraries - the current implementation is far too low level. However, that would require a substantial rewrite. I certainly don't have the time to do that, and to be honest even setting up a development environment and unit tests for the quick fix I've proposed is a bit beyond my comfort zone given my unfamiliarity with Airflow and its architecture. In an ideal world, I'd like to think Microsoft would contribute this work themselves, given that they now provide both Airflow and Databricks commercially as managed services, and one would expect them to work together via managed identities out of the box. Perhaps one of the project maintainers could kindly suggest it if they have open channels of communication with Microsoft about their commercial offerings? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated (285f037dbc -> 76477b0977)
This is an automated email from the ASF dual-hosted git repository. ephraimanierobi pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 285f037dbc Remove decorator from rendering fields example (#38827) add 76477b0977 Update version added field in config.yml (#38840) No new revisions were added by this update. Summary of changes: airflow/config_templates/config.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Re: [PR] Implement start/end/debug for multiple executors in scheduler job [airflow]
o-nikolas commented on code in PR #38514: URL: https://github.com/apache/airflow/pull/38514#discussion_r1556103085 ## airflow/jobs/job.py: ## @@ -104,12 +106,13 @@ class Job(Base, LoggingMixin): Only makes sense for SchedulerJob and BackfillJob instances. """ -def __init__(self, executor=None, heartrate=None, **kwargs): +def __init__(self, executor: BaseExecutor | None = None, heartrate=None, **kwargs): # Save init parameters as DB fields self.heartbeat_failed = False self.hostname = get_hostname() if executor: self.executor = executor Review Comment: I'll cut a good-first-issue for this :+1: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update version added field in config.yml [airflow]
ephraimbuddy merged PR #38840: URL: https://github.com/apache/airflow/pull/38840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Chart: Default Airflow version to 2.9.0 [airflow]
ephraimbuddy commented on code in PR #38839: URL: https://github.com/apache/airflow/pull/38839#discussion_r1556101539 ## chart/newsfragments/38478.significant.rst: ## Review Comment: Thanks. Updated ## chart/newsfragments/38478.significant.rst: ## Review Comment: Thanks. Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deferrable mode for Custom Training Job operators [airflow]
potiuk commented on code in PR #38584: URL: https://github.com/apache/airflow/pull/38584#discussion_r1556100397 ## airflow/providers/google/cloud/operators/vertex_ai/custom_job.py: ## @@ -539,6 +564,94 @@ def on_kill(self) -> None: if self.hook: self.hook.cancel_job() +def execute_complete(self, context: Context, event: dict[str, Any]) -> dict[str, Any] | None: +if event["status"] == "error": +raise AirflowException(event["message"]) +result = event["job"] +model_id = self.hook.extract_model_id_from_training_pipeline(result) +custom_job_id = self.hook.extract_custom_job_id_from_training_pipeline(result) +self.xcom_push(context, key="model_id", value=model_id) +VertexAIModelLink.persist(context=context, task_instance=self, model_id=model_id) +# push custom_job_id to xcom so it could be pulled by other tasks +self.xcom_push(context, key="custom_job_id", value=custom_job_id) +return result + +def invoke_defer(self, context: Context) -> None: +custom_container_training_job_obj: CustomContainerTrainingJob = self.hook.submit_custom_container_training_job( +project_id=self.project_id, +region=self.region, +display_name=self.display_name, +command=self.command, +container_uri=self.container_uri, + model_serving_container_image_uri=self.model_serving_container_image_uri, + model_serving_container_predict_route=self.model_serving_container_predict_route, + model_serving_container_health_route=self.model_serving_container_health_route, + model_serving_container_command=self.model_serving_container_command, +model_serving_container_args=self.model_serving_container_args, + model_serving_container_environment_variables=self.model_serving_container_environment_variables, +model_serving_container_ports=self.model_serving_container_ports, +model_description=self.model_description, +model_instance_schema_uri=self.model_instance_schema_uri, +model_parameters_schema_uri=self.model_parameters_schema_uri, +model_prediction_schema_uri=self.model_prediction_schema_uri, +parent_model=self.parent_model, +is_default_version=self.is_default_version, +model_version_aliases=self.model_version_aliases, +model_version_description=self.model_version_description, +labels=self.labels, + training_encryption_spec_key_name=self.training_encryption_spec_key_name, +model_encryption_spec_key_name=self.model_encryption_spec_key_name, +staging_bucket=self.staging_bucket, +# RUN +dataset=Dataset(name=self.dataset_id) if self.dataset_id else None, +annotation_schema_uri=self.annotation_schema_uri, +model_display_name=self.model_display_name, +model_labels=self.model_labels, +base_output_dir=self.base_output_dir, +service_account=self.service_account, +network=self.network, +bigquery_destination=self.bigquery_destination, +args=self.args, +environment_variables=self.environment_variables, +replica_count=self.replica_count, +machine_type=self.machine_type, +accelerator_type=self.accelerator_type, +accelerator_count=self.accelerator_count, +boot_disk_type=self.boot_disk_type, +boot_disk_size_gb=self.boot_disk_size_gb, +training_fraction_split=self.training_fraction_split, +validation_fraction_split=self.validation_fraction_split, +test_fraction_split=self.test_fraction_split, +training_filter_split=self.training_filter_split, +validation_filter_split=self.validation_filter_split, +test_filter_split=self.test_filter_split, +predefined_split_column_name=self.predefined_split_column_name, +timestamp_split_column_name=self.timestamp_split_column_name, +tensorboard=self.tensorboard, +) +custom_container_training_job_obj.wait_for_resource_creation() Review Comment: @Lee-W ? What do you think? Left the other comment for you to resolve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Deferrable mode for Custom Training Job operators [airflow]
potiuk commented on code in PR #38584: URL: https://github.com/apache/airflow/pull/38584#discussion_r1556098638 ## airflow/providers/google/cloud/operators/vertex_ai/custom_job.py: ## @@ -539,6 +564,94 @@ def on_kill(self) -> None: if self.hook: self.hook.cancel_job() +def execute_complete(self, context: Context, event: dict[str, Any]) -> dict[str, Any] | None: +if event["status"] == "error": +raise AirflowException(event["message"]) +result = event["job"] +model_id = self.hook.extract_model_id_from_training_pipeline(result) +custom_job_id = self.hook.extract_custom_job_id_from_training_pipeline(result) +self.xcom_push(context, key="model_id", value=model_id) +VertexAIModelLink.persist(context=context, task_instance=self, model_id=model_id) +# push custom_job_id to xcom so it could be pulled by other tasks +self.xcom_push(context, key="custom_job_id", value=custom_job_id) +return result + +def invoke_defer(self, context: Context) -> None: +custom_container_training_job_obj: CustomContainerTrainingJob = self.hook.submit_custom_container_training_job( +project_id=self.project_id, +region=self.region, +display_name=self.display_name, +command=self.command, +container_uri=self.container_uri, + model_serving_container_image_uri=self.model_serving_container_image_uri, + model_serving_container_predict_route=self.model_serving_container_predict_route, + model_serving_container_health_route=self.model_serving_container_health_route, + model_serving_container_command=self.model_serving_container_command, +model_serving_container_args=self.model_serving_container_args, + model_serving_container_environment_variables=self.model_serving_container_environment_variables, +model_serving_container_ports=self.model_serving_container_ports, +model_description=self.model_description, +model_instance_schema_uri=self.model_instance_schema_uri, +model_parameters_schema_uri=self.model_parameters_schema_uri, +model_prediction_schema_uri=self.model_prediction_schema_uri, +parent_model=self.parent_model, +is_default_version=self.is_default_version, +model_version_aliases=self.model_version_aliases, +model_version_description=self.model_version_description, +labels=self.labels, + training_encryption_spec_key_name=self.training_encryption_spec_key_name, +model_encryption_spec_key_name=self.model_encryption_spec_key_name, +staging_bucket=self.staging_bucket, +# RUN +dataset=Dataset(name=self.dataset_id) if self.dataset_id else None, +annotation_schema_uri=self.annotation_schema_uri, +model_display_name=self.model_display_name, +model_labels=self.model_labels, +base_output_dir=self.base_output_dir, +service_account=self.service_account, +network=self.network, +bigquery_destination=self.bigquery_destination, +args=self.args, +environment_variables=self.environment_variables, +replica_count=self.replica_count, +machine_type=self.machine_type, +accelerator_type=self.accelerator_type, +accelerator_count=self.accelerator_count, +boot_disk_type=self.boot_disk_type, +boot_disk_size_gb=self.boot_disk_size_gb, +training_fraction_split=self.training_fraction_split, +validation_fraction_split=self.validation_fraction_split, +test_fraction_split=self.test_fraction_split, +training_filter_split=self.training_filter_split, +validation_filter_split=self.validation_filter_split, +test_filter_split=self.test_filter_split, +predefined_split_column_name=self.predefined_split_column_name, +timestamp_split_column_name=self.timestamp_split_column_name, +tensorboard=self.tensorboard, +) +custom_container_training_job_obj.wait_for_resource_creation() Review Comment: Ah. yeah. I just saw more context now, then yes - absolutely no problem with it tin this case -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Remove decorator from rendering fields example [airflow]
potiuk merged PR #38827: URL: https://github.com/apache/airflow/pull/38827 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated: Remove decorator from rendering fields example (#38827)
This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git The following commit(s) were added to refs/heads/main by this push: new 285f037dbc Remove decorator from rendering fields example (#38827) 285f037dbc is described below commit 285f037dbcc0f5e23c2d6ac99bfd6cab86c96ac3 Author: Elad Kalif <45845474+elad...@users.noreply.github.com> AuthorDate: Mon Apr 8 19:09:17 2024 +0300 Remove decorator from rendering fields example (#38827) * Remove decorator from rendering fields example * fix --- docs/apache-airflow/core-concepts/operators.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/apache-airflow/core-concepts/operators.rst b/docs/apache-airflow/core-concepts/operators.rst index c7193789bc..0330ef43fb 100644 --- a/docs/apache-airflow/core-concepts/operators.rst +++ b/docs/apache-airflow/core-concepts/operators.rst @@ -246,9 +246,9 @@ you can pass ``render_template_as_native_obj=True`` to the DAG as follows: return json.loads(data_string) -@task(task_id="transform") def transform(order_data): print(type(order_data)) +total_order_value = 0 for value in order_data.values(): total_order_value += value return {"total_order_value": total_order_value}
Re: [PR] Update ACL during job reset [airflow]
subham611 commented on code in PR #38741: URL: https://github.com/apache/airflow/pull/38741#discussion_r1556066734 ## airflow/providers/databricks/hooks/databricks.py: ## @@ -655,6 +656,16 @@ def get_repo_by_path(self, path: str) -> str | None: return None +def update_job_permission(self, json: dict[str, Any]) -> dict: +""" +Update databricks job permission + +:param json: payload +:return: Review Comment: added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Add Weights and Biases Provider [airflow]
odaneau-astro closed pull request #38846: Add Weights and Biases Provider URL: https://github.com/apache/airflow/pull/38846 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
RNHTTR commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1556051546 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: > Does it even need to be API paremeters, or could we do what S3 does, and do this as an HTTP Range request? This is currently the plan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Correctly select task in DAG Graph View when clicking on its name [airflow]
bbovenzi commented on code in PR #38782: URL: https://github.com/apache/airflow/pull/38782#discussion_r1556047065 ## airflow/www/static/js/dag/details/graph/DagNode.tsx: ## @@ -126,10 +125,6 @@ const DagNode = ({ label={taskName} isOpen={isOpen} isGroup={!!childCount} - onClick={(e) => { -e.stopPropagation(); Review Comment: We need this for opening and closing task groups. But we can wrap it in an if statement `if (!!onToggleCollapse) {...` and have both function inside. Therefore, the onclick will propagate to selecting the task if it is not a task group -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] cannot start 'airflow standalone'; pendulum.tz.timezone(tz) -> TypeError [airflow]
dhshah13 commented on issue #38714: URL: https://github.com/apache/airflow/issues/38714#issuecomment-2043070747 and degrade python version to 3.11- it is more stable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [2.9.0] DB migrate throws na error on encrypt_trigger_kwargs [airflow]
potiuk commented on issue #38836: URL: https://github.com/apache/airflow/issues/38836#issuecomment-2043068257 cc: @hussein-awala -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(airflow) branch main updated (87fc581262 -> bfaa4f2012)
This is an automated email from the ASF dual-hosted git repository. vincbeck pushed a change to branch main in repository https://gitbox.apache.org/repos/asf/airflow.git from 87fc581262 Remove false negative SparkInK8s internal deprecations (#38777) add bfaa4f2012 fix: COMMAND string should be raw to avoid SyntaxWarning: invalid escape sequence '\s' (#38734) No new revisions were added by this update. Summary of changes: airflow/providers/amazon/aws/hooks/eks.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Re: [PR] Fix eks.py SyntaxWarning: invalid esape sequence '\s' [airflow]
vincbeck merged PR #38734: URL: https://github.com/apache/airflow/pull/38734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Amazon Bedrock - Model Customization Jobs [airflow]
vincbeck commented on code in PR #38693: URL: https://github.com/apache/airflow/pull/38693#discussion_r1556043966 ## airflow/providers/amazon/aws/sensors/bedrock.py: ## @@ -0,0 +1,111 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +from __future__ import annotations + +from typing import TYPE_CHECKING, Any, Sequence + +from airflow.configuration import conf +from airflow.providers.amazon.aws.sensors.base_aws import AwsBaseSensor +from airflow.providers.amazon.aws.triggers.bedrock import BedrockCustomizeModelCompletedTrigger +from airflow.providers.amazon.aws.utils.mixins import aws_template_fields + +if TYPE_CHECKING: +from airflow.utils.context import Context + +from airflow.exceptions import AirflowException, AirflowSkipException +from airflow.providers.amazon.aws.hooks.bedrock import BedrockHook + + +class BedrockCustomizeModelCompletedSensor(AwsBaseSensor[BedrockHook]): +""" +Poll the state of the model customization job until it reaches a terminal state; fails if the job fails. + +.. seealso:: +For more information on how to use this sensor, take a look at the guide: +:ref:`howto/sensor:BedrockCustomizeModelCompletedSensor` + + +:param job_name: The name of the Bedrock model customization job. + +:param deferrable: If True, the sensor will operate in deferrable mode. This mode requires aiobotocore +module to be installed. +(default: False, but can be overridden in config file by setting default_deferrable to True) +:param max_retries: Number of times before returning the current state. (default: 75) +:param poke_interval: Polling period in seconds to check for the status of the job. (default: 120) +:param aws_conn_id: The Airflow connection used for AWS credentials. +If this is ``None`` or empty then the default boto3 behaviour is used. If +running Airflow in a distributed manner and aws_conn_id is None or +empty, then default boto3 configuration would be used (and must be +maintained on each worker node). +:param region_name: AWS region_name. If not specified then the default boto3 behaviour is used. +:param verify: Whether or not to verify SSL certificates. See: + https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html +:param botocore_config: Configuration dictionary (key-values) for botocore client. See: + https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html +""" + +INTERMEDIATE_STATES = ("InProgress",) +FAILURE_STATES = ("Failed", "Stopping", "Stopped") +SUCCESS_STATES = ("Completed",) +FAILURE_MESSAGE = "Bedrock model customization job sensor failed." + +aws_hook_class = BedrockHook +template_fields: Sequence[str] = aws_template_fields("job_name") +ui_color = "#66c3ff" + +def __init__( +self, +*, +job_name: str, +max_retries: int = 75, +poke_interval: int = 120, +deferrable: bool = conf.getboolean("operators", "default_deferrable", fallback=False), +**kwargs: Any, +) -> None: +super().__init__(**kwargs) +self.job_name = job_name +self.poke_interval = poke_interval +self.max_retries = max_retries +self.deferrable = deferrable + +def execute(self, context: Context) -> Any: +if self.deferrable: +self.defer( +trigger=BedrockCustomizeModelCompletedTrigger( +job_name=self.job_name, +waiter_delay=int(self.poke_interval), +waiter_max_attempts=self.max_retries, +aws_conn_id=self.aws_conn_id, +), +method_name="poke", +) +else: +super().execute(context=context) + +def poke(self, context: Context) -> bool: +state = self.hook.get_customize_model_job_state(self.job_name) + +if state in self.FAILURE_STATES: +# TODO: remove this if block when min_airflow_version is set to higher than 2.7.1 +if self.soft_fail: +raise AirflowSkipException(self.FAILURE_MESSAGE) +raise
Re: [PR] fix: update `get_uri` to return URI in sqlAlchemy URI format [airflow]
rawwar commented on PR #38831: URL: https://github.com/apache/airflow/pull/38831#issuecomment-2043040070 @Taragolis , based on your comment [here](https://github.com/apache/airflow/issues/38195#issuecomment-2042644747) > As far as i remember, we recently discuss this case in slack. I guess better introduce sa_uri property (or some similar naming) which returns [sqlalchemy.engine.URL](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.engine.URL) into the DbApiHook without any implementation, e.g. raise NotImplementedError and implements it per hook which have implementation for SQAlchemy. I guess we leave the implementation of `get_uri` dependent on the provider type. If so, do we really need to have `sa_uri` property with `NotImplementedError`? Since SqlAlchemy URI is pre-defined, we can consider using the connection and try to form the URI. Raise an error if it has issues. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Connection to Postgres failed due to special characters in a password [airflow]
rawwar commented on issue #38187: URL: https://github.com/apache/airflow/issues/38187#issuecomment-2043038768 @Taragolis , based on your comment [here](https://github.com/apache/airflow/issues/38195#issuecomment-2042644747) > As far as i remember, we recently discuss this case in slack. I guess better introduce sa_uri property (or some similar naming) which returns [sqlalchemy.engine.URL](https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.engine.URL) into the DbApiHook without any implementation, e.g. raise NotImplementedError and implements it per hook which have implementation for SQAlchemy. I guess we leave the implementation of `get_uri` dependent on the provider type. If so, do we really need to have `sa_uri` property with `NotImplementedError`? Since SqlAlchemy URI is pre-defined, we can consider using the connection and try to form the URI. Raise an error if it has issues. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] Change the UI color for ExternalSensor tasks [airflow]
Timelessprod opened a new issue, #38845: URL: https://github.com/apache/airflow/issues/38845 ### Description With the recent update of the graph display in DAGs, there's a visibility issue regarding external task sensors. Indeed those tasks have a `#19647e` tile color while the status & operator name are displayed in `--chakra-colors-gray-500` (aka `#718096`) on top of it, which makes the text unreadable. You can see a screenshot below: https://github.com/apache/airflow/assets/69579602/8497faa7-9a61-4e59-a0ff-576844cfe67f;> I can submit a PR for it if needed but I'm unsure if there's a chart regarding which color to give to what type of operator etc. so please tell me. My idea would be to use same tint but lighter color like `#4db7db` which is more readable (see below) but not perfect for accessibility: https://github.com/apache/airflow/assets/69579602/e2fa562b-668f-4756-9ef0-421d5da3280e;> ### Use case/motivation The current color makes monitoring and debugging very annoying as it's unreadable. It is also a big issue regarding accessibility as color-blind people may not be able to guess the status name by the status color. As an argument, it was red flagged by Chrome Lighthouse when I ran a test on the webpage. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Change the UI color for ExternalSensor tasks [airflow]
boring-cyborg[bot] commented on issue #38845: URL: https://github.com/apache/airflow/issues/38845#issuecomment-2043032243 Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] White INFO log is hard to read [airflow]
romanzdk opened a new issue, #38844: URL: https://github.com/apache/airflow/issues/38844 ### Apache Airflow version main (development) ### If "Other Airflow 2 version" selected, which one? 2.9.0 ### What happened? White INFO log is hard to read ![image](https://github.com/apache/airflow/assets/31843161/aff517de-6a21-4043-b1a3-cd13156c9682) ### What you think should happen instead? use normal black font color ### How to reproduce use airflow 2.9.0 with some KubernetesPodOperator outputting random INFO logs ### Operating System Debian GNU/Linux 12 (bookworm) ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==8.19.0 apache-airflow-providers-celery==3.6.1 apache-airflow-providers-cncf-kubernetes==8.0.1 apache-airflow-providers-common-io==1.3.0 apache-airflow-providers-common-sql==1.11.1 apache-airflow-providers-fab==1.0.2 apache-airflow-providers-ftp==3.7.0 apache-airflow-providers-http==4.10.0 apache-airflow-providers-imap==3.5.0 apache-airflow-providers-postgres==5.10.2 apache-airflow-providers-smtp==1.6.1 apache-airflow-providers-sqlite==3.7.1 ### Deployment Other Docker-based deployment ### Deployment details celery executor, on-premise kubernetes 1.26.5, postgres 13 ### Anything else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Update dagrun.py to fix #38816 [airflow]
boring-cyborg[bot] commented on PR #38843: URL: https://github.com/apache/airflow/pull/38843#issuecomment-2043013963 Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst) Here are some useful points: - Pay attention to the quality of your code (ruff, mypy and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/contributing-docs/08_static_code_checks.rst#prerequisites-for-pre-commit-hooks) will help you with that. - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it. - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/dev/breeze/doc/README.rst) for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations. - Be patient and persistent. It might take some time to get a review or get the final approval from Committers. - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack. - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#coding-style-and-best-practices). - Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits. Apache Airflow is a community-driven project and together we are making it better . In case of doubts contact the developers at: Mailing List: d...@airflow.apache.org Slack: https://s.apache.org/airflow-slack -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Update dagrun.py to fix #38816 [airflow]
MyLong opened a new pull request, #38843: URL: https://github.com/apache/airflow/pull/38843 'DagRunNote' object has no attribute dag_id\dagrun_id\run_id\map_index so replace those attributes to existing attributes dag_run_id\content -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] OpenAI Chat & Assistant hook functions [airflow]
Lee-W commented on code in PR #38736: URL: https://github.com/apache/airflow/pull/38736#discussion_r1556009587 ## airflow/providers/openai/hooks/openai.py: ## @@ -77,6 +89,165 @@ def get_conn(self) -> OpenAI: **openai_client_kwargs, ) +def create_chat_completion( +self, +messages: list[ +ChatCompletionSystemMessageParam +| ChatCompletionUserMessageParam +| ChatCompletionAssistantMessageParam +| ChatCompletionToolMessageParam +| ChatCompletionFunctionMessageParam +], +model: str = "gpt-3.5-turbo", +**kwargs: Any, +) -> ChatCompletionMessage: +""" +Create a model response for the given chat conversation and returns the message. + +:param messages: A list of messages comprising the conversation so far +:param model: ID of the model to use +""" +response = self.conn.chat.completions.create(model=model, messages=messages, **kwargs) +return response.choices[0].message + +def create_assistant(self, model: str = "gpt-3.5-turbo", **kwargs: Any) -> Assistant: +"""Create an OpenAI assistant using the given model. + +:param model: The OpenAI model for the assistant to use. +""" +assistant = self.conn.beta.assistants.create(model=model, **kwargs) +return assistant + +def get_assistant(self, assistant_id: str) -> Assistant: +""" +Get an OpenAI assistant. + +:param assistant_id: The ID of the assistant to retrieve. +""" +assistant = self.conn.beta.assistants.retrieve(assistant_id=assistant_id) +return assistant + +def get_assistants(self, **kwargs) -> list[Assistant]: +"""Get a list of Assistant objects.""" +assistants = self.conn.beta.assistants.list(**kwargs) +return assistants.data + +def get_assistant_with_name(self, assistant_name: str) -> Assistant | None: Review Comment: ```suggestion def get_assistant_by_name(self, assistant_name: str) -> Assistant | None: ``` Not sure whether it sounds more reasonable 樂 ## airflow/providers/openai/hooks/openai.py: ## @@ -77,6 +89,165 @@ def get_conn(self) -> OpenAI: **openai_client_kwargs, ) +def create_chat_completion( +self, +messages: list[ +ChatCompletionSystemMessageParam +| ChatCompletionUserMessageParam +| ChatCompletionAssistantMessageParam +| ChatCompletionToolMessageParam +| ChatCompletionFunctionMessageParam +], +model: str = "gpt-3.5-turbo", +**kwargs: Any, +) -> ChatCompletionMessage: +""" +Create a model response for the given chat conversation and returns the message. + +:param messages: A list of messages comprising the conversation so far +:param model: ID of the model to use +""" +response = self.conn.chat.completions.create(model=model, messages=messages, **kwargs) +return response.choices[0].message + +def create_assistant(self, model: str = "gpt-3.5-turbo", **kwargs: Any) -> Assistant: +"""Create an OpenAI assistant using the given model. + +:param model: The OpenAI model for the assistant to use. +""" +assistant = self.conn.beta.assistants.create(model=model, **kwargs) +return assistant + +def get_assistant(self, assistant_id: str) -> Assistant: +""" +Get an OpenAI assistant. + +:param assistant_id: The ID of the assistant to retrieve. +""" +assistant = self.conn.beta.assistants.retrieve(assistant_id=assistant_id) +return assistant + +def get_assistants(self, **kwargs) -> list[Assistant]: Review Comment: ```suggestion def get_assistants(self, **kwargs: Any) -> list[Assistant]: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] fix: try002 for provider google [airflow]
dondaum commented on code in PR #38803: URL: https://github.com/apache/airflow/pull/38803#discussion_r1556001809 ## airflow/providers/google/cloud/hooks/dataflow.py: ## @@ -252,7 +252,7 @@ def _get_current_jobs(self) -> list[dict]: self._job_id = jobs[0]["id"] return jobs else: -raise Exception("Missing both dataflow job ID and name.") +raise AirflowException("Missing both dataflow job ID and name.") Review Comment: Good point. I checked again and yes in makes sense for me to rather use `ValueError`. Changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Applying queue with SparkSubmitOperator is ignored from 2.6.0 onwards [airflow]
pateash commented on issue #38461: URL: https://github.com/apache/airflow/issues/38461#issuecomment-2042992467 Checking this, can any one please assign this to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Expose AWS IAM missing param in Hashicorp secret [airflow]
Lee-W commented on code in PR #38536: URL: https://github.com/apache/airflow/pull/38536#discussion_r1555990209 ## airflow/providers/hashicorp/provider.yaml: ## @@ -55,6 +55,7 @@ versions: dependencies: - apache-airflow>=2.6.0 - hvac>=1.1.0 + - apache-airflow-providers-amazon>=8.0.0 Review Comment: Should (or could) we make this an extra dep for those who need to use aws? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Bugfix to correct GCSHook being called even when not required with BeamRunPythonPipelineOperator [airflow]
zstrathe commented on code in PR #38716: URL: https://github.com/apache/airflow/pull/38716#discussion_r1555982325 ## airflow/providers/apache/beam/operators/beam.py: ## @@ -364,11 +364,13 @@ def execute(self, context: Context): def execute_sync(self, context: Context): with ExitStack() as exit_stack: -gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id) if self.py_file.lower().startswith("gs://"): +gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id) tmp_gcs_file = exit_stack.enter_context(gcs_hook.provide_file(object_url=self.py_file)) self.py_file = tmp_gcs_file.name if self.snake_case_pipeline_options.get("requirements_file", "").startswith("gs://"): +if 'gcs_hook' not in locals(): Review Comment: @e-galan thanks for the feedback! I will go ahead and remove that check then. In addition, after looking through the code some more, it looks like any other GCS resources may not be instantiated if supplied as pipeline_options? For example, the below from tests/system/providers/apache/beam/example_python.py: ``` start_python_pipeline_direct_runner = BeamRunPythonPipelineOperator( task_id="start_python_pipeline_direct_runner", py_file=GCS_PYTHON, py_options=[], pipeline_options={"output": GCS_OUTPUT}, py_requirements=["apache-beam[gcp]==2.46.0"], py_interpreter="python3", py_system_site_packages=False, ) ``` Would ```pipeline_options={"output": GCS_OUTPUT}``` correctly result in the output file being updated in GCS? If not, I think that somehow every value in pipeline_options should be recursively parsed for conversion to a GCS resource, with the complication that "output" files would need to utilize ```GCSHook.provide_file_and_upload()``` instead of ```GCSHook.provide_file()```. And I think that could be solved by adding another file prefix to distinguish GCS resources that need to be uploaded (i.e., ```"gcs-upload://"```), and updating the docs to note that. If this all sounds correct, I'd be happy to add to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org