Re: [PR] Paginate Airflow task logs [airflow]

2024-05-29 Thread via GitHub


github-actions[bot] closed pull request #38807: Paginate Airflow task logs
URL: https://github.com/apache/airflow/pull/38807


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-05-24 Thread via GitHub


github-actions[bot] commented on PR #38807:
URL: https://github.com/apache/airflow/pull/38807#issuecomment-2130540308

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed in 5 days if no further activity occurs. 
Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


uranusjr commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1556841592


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   I think what Ash means is we take the Range header in the API, and forward 
it to the log server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


RNHTTR commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1556051546


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   > Does it even need to be API paremeters, or could we do what S3 does, and 
do this as an HTTP Range request?
   
   This is currently the plan



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


ashb commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1555658225


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   Does it even need to be API paremeters, or could we do what S3 does, and do 
this as an HTTP Range request?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] Paginate Airflow task logs [airflow]

2024-04-08 Thread via GitHub


uranusjr commented on code in PR #38807:
URL: https://github.com/apache/airflow/pull/38807#discussion_r1555488988


##
airflow/providers/amazon/aws/log/s3_task_handler.py:
##
@@ -178,7 +180,13 @@ def s3_read(self, remote_log_location: str, return_error: 
bool = False) -> str:
 :return: the log found at the remote_log_location
 """
 try:
-return self.hook.read_key(remote_log_location)
+range: str = None
+if page_number is not None:
+page_size = 1024 * 100  # TODO: Create config for page_size

Review Comment:
   I wonder if we should just put this in the API interface. So instead of a 
page number (of fixed size pages), the API user would do `offset=0=500` 
and `offset=500=500` and so on. (I took the argument names from SQL but 
other suggestions are very much welcomed.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Paginate Airflow task logs [airflow]

2024-04-06 Thread via GitHub


RNHTTR opened a new pull request, #38807:
URL: https://github.com/apache/airflow/pull/38807

   
   
   
   
   It's relatively easy for the webserver to get overwhelmed with large log 
files served from remote blob storage. Instead of just throwing more memory at 
the webserver, this seeks to paginate log files served from remote blob storage 
(at least with S3 to start)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org