alirezazamani commented on a change in pull request #1346:
URL: https://github.com/apache/helix/pull/1346#discussion_r485126373



##########
File path: 
helix-core/src/test/java/org/apache/helix/integration/task/TestEnqueueJobs.java
##########
@@ -135,10 +146,18 @@ public void testQueueParallelJobs() throws 
InterruptedException {
             .setCommand(MockTask.TASK_COMMAND).setMaxAttemptsPerTask(2)
             
.setJobCommandConfigMap(Collections.singletonMap(MockTask.JOB_DELAY, "10000"));
 
+    _driver.waitToStop(queueName, 5000L);
+
     // Add 4 jobs to the queue
+    List<String> jobNames = new ArrayList<>();
+    List<JobConfig.Builder> jobBuilders = new ArrayList<>();
     for (int i = 0; i < numberOfJobsAddedBeforeControllerSwitch; i++) {
-      _driver.enqueueJob(queueName, "JOB" + i, jobBuilder);
+      jobNames.add("JOB" + i);
+      jobBuilders.add(jobBuilder);
     }
+    _driver.enqueueJobs(queueName, jobNames, jobBuilders);

Review comment:
       Yeah. Batching would help. The issue seems to be in the nature of 
Helix/ZK if user adds jobs in a row. So if this is the behavior, it is better 
to do batch job addition. So let's say user is adding jobs one by one. Let's 
say user is job1, job2 and job3. 
   At T1: Job1 and job2 are added and jobDAG is changed.
   At T2: We get children of config and know new configs of job1 and job2 and 
change in DAG.
   At T3: job3 is added and is being added to DAG.
   At T4: Refresh is started and and controller see config of job1, job2 and 
DAG will be Job1, job2 and job3.
   Now in the pipeline since we see job3 in the DAG and we do not see config, 
we purge job3 and remove config from ZK. Hence job3 will not be finished at all.
   
   To answer your question, it depends. It depends on when you resume the 
queue, and what part of the logic we are. If we resume the but controller has 
not received the notification for new jobConfig, then it can skip scheduling 
for first pipeline. But the main issue is that if controller does not see the 
jobConfig (which is highly possible when we don't do add job in batch and in a 
loop we keep adding jobs one by one), then purge can delete that job (because 
controller realized the job is in the DAG but is missing from config) and then 
the job never get's finish.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to