Could anyone share your patterns for ensuring queue messages are not lost on consumer node crash or a bug with unhandled error?
Since Queue does not appear to offer transactional behavior for users of the queue (and it is pretty unclear what it would be), something else is needed. Option I am using currently is creating a message copy in shared file system to which all client nodes have access, updating its expiration timestamp in a separate thread while the message being processed (if needed) and checking periodically for any expired copies pushing them back on the queue and deleting message copies after successful processing or recovery. Same could be done with database more efficiently but this particular application fandles large flow of files we did not want to introduce a DB. Now we are going to AWS S3 and I am not sure I want to use S3 to mimic this behavior (charge for every S3 operation). I guess the options I have are to a) rewrite away from Ignite queues to Amazon SQS which is based on polling but has a very nice feature where taken messages become invisible for a configured period of time which could be extended and show up on the queue once time expires if not deleted by the consumer. Actually exactly what I was doing with my file based recovery. b) Provision a small DynamoDB to replicate what I was doing with files, c) try to use Ignite (persistent) map to deposit copies of in-flight messages to be handled in the same way d) something I did not think about you may suggest :-) If I switch away from Ignite queues to AWS there will be only extensive Semaphores and Locks (and a little maps) left it amy not worth to maintain Ignite cluster I may have to consider using some Dynamo/polling based AWS distributed locking ... Would appreciate your comments - maybe I am missing something Ignite could do. I do not have that much experience with Ignite - just switched from Hazelcast.