Re: [PR] Paginate Airflow task logs [airflow]
github-actions[bot] closed pull request #38807: Paginate Airflow task logs URL: https://github.com/apache/airflow/pull/38807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
github-actions[bot] commented on PR #38807: URL: https://github.com/apache/airflow/pull/38807#issuecomment-2130540308 This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
uranusjr commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1556841592 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: I think what Ash means is we take the Range header in the API, and forward it to the log server. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
RNHTTR commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1556051546 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: > Does it even need to be API paremeters, or could we do what S3 does, and do this as an HTTP Range request? This is currently the plan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
ashb commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1555658225 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: Does it even need to be API paremeters, or could we do what S3 does, and do this as an HTTP Range request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Paginate Airflow task logs [airflow]
uranusjr commented on code in PR #38807: URL: https://github.com/apache/airflow/pull/38807#discussion_r1555488988 ## airflow/providers/amazon/aws/log/s3_task_handler.py: ## @@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: bool = False) -> str: :return: the log found at the remote_log_location """ try: -return self.hook.read_key(remote_log_location) +range: str = None +if page_number is not None: +page_size = 1024 * 100 # TODO: Create config for page_size Review Comment: I wonder if we should just put this in the API interface. So instead of a page number (of fixed size pages), the API user would do `offset=0=500` and `offset=500=500` and so on. (I took the argument names from SQL but other suggestions are very much welcomed.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Paginate Airflow task logs [airflow]
RNHTTR opened a new pull request, #38807: URL: https://github.com/apache/airflow/pull/38807 It's relatively easy for the webserver to get overwhelmed with large log files served from remote blob storage. Instead of just throwing more memory at the webserver, this seeks to paginate log files served from remote blob storage (at least with S3 to start) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org