[jira] [Created] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks
Joe McDonnell created IMPALA-13015: -- Summary: Dataload fails due to concurrency issue with test.jceks Key: IMPALA-13015 URL: https://issues.apache.org/jira/browse/IMPALA-13015 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 4.4.0 Reporter: Joe McDonnell When doing dataload locally, it fails with this error: {noformat} Traceback (most recent call last): File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in if __name__ == "__main__": main() File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main os.remove(jceks_path) OSError: [Errno 2] No such file or directory: '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks' Background task Loading functional-query data (pid 501094) failed. {noformat} testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, and TPC-DS in parallel, so this logic has race conditions: {noformat} jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks" if os.path.exists(jceks_path): os.remove(jceks_path){noformat} I don't see a specific reason for this to be in bin/load-data.py. It should be moved somewhere else that doesn't run in parallel. One possible location is to add a step in testdata/bin/create-load-data.sh This was introduced in [https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks
Joe McDonnell created IMPALA-13015: -- Summary: Dataload fails due to concurrency issue with test.jceks Key: IMPALA-13015 URL: https://issues.apache.org/jira/browse/IMPALA-13015 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 4.4.0 Reporter: Joe McDonnell When doing dataload locally, it fails with this error: {noformat} Traceback (most recent call last): File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in if __name__ == "__main__": main() File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main os.remove(jceks_path) OSError: [Errno 2] No such file or directory: '/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks' Background task Loading functional-query data (pid 501094) failed. {noformat} testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, and TPC-DS in parallel, so this logic has race conditions: {noformat} jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks" if os.path.exists(jceks_path): os.remove(jceks_path){noformat} I don't see a specific reason for this to be in bin/load-data.py. It should be moved somewhere else that doesn't run in parallel. One possible location is to add a step in testdata/bin/create-load-data.sh This was introduced in [https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org