The issue seemed weird to me as well. It was not reproducible and so I
just assumed that something must have gone wrong with the installation.
I had this issue occur in January and it just happened again over the
weekend. This was using Ignite 1.5.0.final.
I've verified that all the nodes are configured using
FifoQueueCollisionSpi with parallelJobsNumber = 1.
The nodes which execute the jobs are configured via xml:
... <property name="collisionSpi">
<bean
class="org.apache.ignite.spi.collision.fifoqueue.FifoQueueCollisionSpi">
<property name="parallelJobsNumber" value="1"/>
</bean>
</property> ...
Based on your previous response I believe the collisionSPI on the node
submitting the task does not matter. Just in case that node also has
the SPI configured:
IgniteConfiguration igniteConfig = new IgniteConfiguration();
igniteConfig.setMarshaller(new OptimizedMarshaller());
igniteConfig.setMetricsLogFrequency(3600000);
FifoQueueCollisionSpi colSpi = new FifoQueueCollisionSpi();
colSpi.setParallelJobsNumber(1);
igniteConfig.setCollisionSpi(colSpi);
...
On the previous occurrence of this bug I added this code to the job
execution:
CollisionSpi collisionSpi = grid.configuration().getCollisionSpi();
if (collisionSpi instanceof FifoQueueCollisionSpi) {
FifoQueueCollisionSpi fifo = (FifoQueueCollisionSpi)
collisionSpi;
int parallelJobsNumber = fifo.getParallelJobsNumber();
_logger.info("FifoQueueCollisionSpi used with
parallelJobsNumber:" + parallelJobsNumber);
} else {
_logger.info("CollisionSpi is not FifoQueueCollisionSpi
but:" + collisionSpi.getClass().getSimpleName());
}
And in the logs I see:
FifoQueueCollisionSpi used with parallelJobsNumber:1
However I also see three jobs starting on the same node. The jobs can
take minutes to hours to complete and unfortunately the jobs have to
interact with a gui application. When multiple jobs are executed at the
same time there are race conditions related to which workspace the gui
application has open. Also during the job execution the gui application
computes some values. If multiple computes are done at the same time
the results get mixed up.
Are there known issues with FifoQueueCollisionSpi? Are there any
workarounds?
I'm considering adding an atomicinteger counter check in the job
execution code. Do you have any suggestions? I was thinking that if I
had failover setup it should be safe to fail any jobs that attempt to
start concurrently.
Lastly, thanks for the hard work on Ignite (and GridGain!).
-Ryan
On 11/7/2016 6:04 PM, vkulichenko wrote:
Collision SPI is called on the node that executes the job. Having said that,
what you tell sounds a bit weird. Are you sure other nodes didn't lose the
config as well?
-Val
--
View this message in context:
http://apache-ignite-users.70518.x6.nabble.com/Concurrent-job-execution-and-FifoQueueCollisionSpi-parallelJobsNumber-1-tp8697p8749.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.