[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1402095422 The original issue was due to traversing over dag folder every time and also recursively inside zip files to construct paths like test_dag.zip/test_file.py which is IO expensive. The current approach just uses file_paths as in main branch and changes only query for zip files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1401489859 @potiuk The initial issue was that file_paths would give "file.zip" but import error table had "file.zip/file_inside_zip.py". I initially suggested modifying file_paths function to return "file.zip/file_inside_zip.py" but since it was expensive to do it everytime a migration was suggested to add a new column in import error table to store "file.zip" for "file.zip/file_inside_zip.py" and use that in query so that both file_path values and the value in import error table are same. Currently, I modified the PR so that file_paths will continue to return "file.zip" as before and we will use `startswith` so that "file.zip/file_inside_zip.py" in the query similar to the migration solution without the need for any actual migration. Please review the updated PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399416351 Updated code and tests since the `clear_nonexistent_import_errors` is now an internal API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399415469 Thanks @potiuk , I guess the following does the similar without a migration. Lets say there are 3 errors with c.py deleted. Then `self._file_paths` will be `["a.py", "b.zip"]` and it will create a query to find filenames that don't start with b.zip and not equal to a.py thus deleting only c.py error entry. - a.py - b.zip/c.py - c.py (file deleted) ```python query = session.query(errors.ImportError) if self._file_paths: for file_path in self._file_paths: if file_path.endswith(".zip"): query = query.filter(~(errors.ImportError.filename.startswith(file_path))) else: query = query.filter(errors.ImportError.filename !== file_path) query.delete(synchronize_session="fetch") session.commit() ``` ```python with create_session() as session: print(session.query(ImportError).filter(~(ImportError.filename.startswith("b.zip"))).filter(ImportError.filename != 'a.py')) SELECT import_error.id AS import_error_id, import_error.timestamp AS import_error_timestamp, import_error.filename AS import_error_filename, import_error.stacktrace AS import_error_stacktrace FROM import_error WHERE (import_error.filename NOT LIKE %(filename_1)s || '%%') AND import_error.filename != %(filename_2)s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399175298 I am not sure if there is a reliable way to extract "/home/karthikeyan/airflow/dags/error_dag.zip" from "/home/karthikeyan/airflow/dags/error_dag.zip/error_dag.py" through zipfile module for migration of existing errors. If it's just clearing based on zip file then wouldn't this become a query like below just deleting all zip file related errors though the zip files haven't changed? It might become like a query like below that already exists in processor.py ```python query = session.query(errors.ImportError) if self._file_paths: for file_path in self._file_paths: query = query.filter(errors.ImportError.filename.startswith(file_path)) query.delete(synchronize_session="fetch") session.commit() ``` https://github.com/apache/airflow/blob/09b3a29972430e5749d772359692fe4a9d528e48/airflow/dag_processing/processor.py#L563-L566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1397945755 @potiuk @uranusjr Sorry, I am little confused. Currently `ImportError` model has filename column that stores full path to Python along with zipfile like `/home/karthikeyan/airflow/dags/error_dag.zip/error_dag.py` . Do you want to add a migration where `file_path` column stores `/home/karthikeyan/airflow/dags/error_dag.zip` value and use it in the deletion query? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344459058 Thanks @potiuk, I updated my branch with main before creating the PR but it seems some other PRs were merged after my PR. I will rebase in future in case of test failures. Thanks again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344327963 The test failure seems to be unrelated to the pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.
tirkarthi commented on PR #28256: URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344236864 Related commit that added full path to Python file inside zip file to show all import errors : 97f00654ca9c43ec87e1f3ec207b0f935dd89e15 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org