[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-24 Thread via GitHub


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1402095422

   The original issue was due to traversing over dag folder every time and also 
recursively inside zip files to construct paths like test_dag.zip/test_file.py 
which is IO expensive. The current approach just uses file_paths as in main 
branch and changes only query for zip files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-23 Thread via GitHub


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1401489859

   @potiuk  The initial issue was that file_paths would give "file.zip" but 
import error table had "file.zip/file_inside_zip.py". I initially suggested 
modifying file_paths function to return "file.zip/file_inside_zip.py" but since 
it was expensive to do it everytime a migration was suggested to add a new 
column in import error table to store "file.zip" for 
"file.zip/file_inside_zip.py" and use that in query so that both file_path 
values and the value in import error table are same.
   
   Currently, I modified the PR so that file_paths will continue to return 
"file.zip" as before and we will use `startswith` so that 
"file.zip/file_inside_zip.py" in the query similar to the migration solution 
without the need for any actual migration. Please review the updated PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-21 Thread via GitHub


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399416351

   Updated code and tests since the `clear_nonexistent_import_errors` is now an 
internal API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-21 Thread via GitHub


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399415469

   Thanks @potiuk , I guess the following does the similar without a migration. 
Lets say there are 3 errors with c.py deleted. Then `self._file_paths` will be 
`["a.py", "b.zip"]` and it will create a query to find filenames that don't 
start with b.zip and not equal to a.py thus deleting only c.py error entry.
   
   - a.py
   - b.zip/c.py
   - c.py (file deleted) 
   
   ```python
   query = session.query(errors.ImportError)
   if self._file_paths:
   for file_path in self._file_paths:
   if file_path.endswith(".zip"):
   query = 
query.filter(~(errors.ImportError.filename.startswith(file_path)))
   else:
   query = query.filter(errors.ImportError.filename !== file_path)
   query.delete(synchronize_session="fetch")
   session.commit()
   ```
   
   ```python
   with create_session() as session:
   
print(session.query(ImportError).filter(~(ImportError.filename.startswith("b.zip"))).filter(ImportError.filename
 != 'a.py'))
   
   SELECT import_error.id AS import_error_id, import_error.timestamp AS 
import_error_timestamp, import_error.filename AS import_error_filename, 
import_error.stacktrace AS import_error_stacktrace 
   FROM import_error 
   WHERE (import_error.filename NOT LIKE %(filename_1)s || '%%') AND 
import_error.filename != %(filename_2)s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-20 Thread via GitHub


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1399175298

   I am not sure if there is a reliable way to extract 
"/home/karthikeyan/airflow/dags/error_dag.zip" from 
"/home/karthikeyan/airflow/dags/error_dag.zip/error_dag.py" through zipfile 
module for migration of existing errors. If it's just clearing based on zip 
file then wouldn't this become a query like below just deleting all zip file 
related errors though the zip files haven't changed? It might become like a 
query like below that already exists in processor.py
   
   ```python
   query = session.query(errors.ImportError)
   if self._file_paths:
   for file_path in self._file_paths:
   query = 
query.filter(errors.ImportError.filename.startswith(file_path))
   query.delete(synchronize_session="fetch")
   session.commit() 
   ```
   
   
https://github.com/apache/airflow/blob/09b3a29972430e5749d772359692fe4a9d528e48/airflow/dag_processing/processor.py#L563-L566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2023-01-19 Thread GitBox


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1397945755

   @potiuk @uranusjr Sorry, I am little confused. Currently `ImportError` model 
has filename column that stores full path to Python along with zipfile like 
`/home/karthikeyan/airflow/dags/error_dag.zip/error_dag.py` . Do you want to 
add a migration where `file_path` column stores 
`/home/karthikeyan/airflow/dags/error_dag.zip` value and use it in the deletion 
query?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2022-12-09 Thread GitBox


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344459058

   Thanks @potiuk, I updated my branch with main before creating the PR but it 
seems some other PRs were merged after my PR. I will rebase in future in case 
of test failures. Thanks again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2022-12-09 Thread GitBox


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344327963

   The test failure seems to be unrelated to the pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] tirkarthi commented on pull request #28256: Include full path to Python files under zip path while clearing import errors.

2022-12-09 Thread GitBox


tirkarthi commented on PR #28256:
URL: https://github.com/apache/airflow/pull/28256#issuecomment-1344236864

   Related commit that added full path to Python file inside zip file to show 
all import errors : 97f00654ca9c43ec87e1f3ec207b0f935dd89e15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org