[jira] [Created] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks

2024-04-18 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13015:
--

 Summary: Dataload fails due to concurrency issue with test.jceks
 Key: IMPALA-13015
 URL: https://issues.apache.org/jira/browse/IMPALA-13015
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


When doing dataload locally, it fails with this error:
{noformat}
Traceback (most recent call last):
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in 

    if __name__ == "__main__": main()
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main
    os.remove(jceks_path)
OSError: [Errno 2] No such file or directory: 
'/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
Background task Loading functional-query data (pid 501094) failed.
{noformat}
testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, 
and TPC-DS in parallel, so this logic has race conditions:
{noformat}
  jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
  if os.path.exists(jceks_path):
    os.remove(jceks_path){noformat}
I don't see a specific reason for this to be in bin/load-data.py. It should be 
moved somewhere else that doesn't run in parallel. One possible location is to 
add a step in testdata/bin/create-load-data.sh

This was introduced in 
[https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (IMPALA-13015) Dataload fails due to concurrency issue with test.jceks

2024-04-18 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-13015:
--

 Summary: Dataload fails due to concurrency issue with test.jceks
 Key: IMPALA-13015
 URL: https://issues.apache.org/jira/browse/IMPALA-13015
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 4.4.0
Reporter: Joe McDonnell


When doing dataload locally, it fails with this error:
{noformat}
Traceback (most recent call last):
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 523, in 

    if __name__ == "__main__": main()
  File "/home/joemcdonnell/upstream/Impala/bin/load-data.py", line 322, in main
    os.remove(jceks_path)
OSError: [Errno 2] No such file or directory: 
'/home/joemcdonnell/upstream/Impala/testdata/jceks/test.jceks'
Background task Loading functional-query data (pid 501094) failed.
{noformat}
testdata/bin/create-load-data.sh calls bin/load-data.py for functional, TPC-H, 
and TPC-DS in parallel, so this logic has race conditions:
{noformat}
  jceks_path = TESTDATA_JCEKS_DIR + "/test.jceks"
  if os.path.exists(jceks_path):
    os.remove(jceks_path){noformat}
I don't see a specific reason for this to be in bin/load-data.py. It should be 
moved somewhere else that doesn't run in parallel. One possible location is to 
add a step in testdata/bin/create-load-data.sh

This was introduced in 
[https://github.com/apache/impala/commit/9837637d9342a49288a13a421d4e749818da1432]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org