Hi MrNode,

This is way too generic task description to be node-specific. We don't know 
how is the file processed. We don't know how is this CSV processing needs 
to be atomic or not - it seems yes, but not clear. We don't know if you can 
abort it or not. We don't know if you want to block the upload function 
completely or you merely block it because you can't abort the previous job. 
Such a thing needs to be designed with the task specifics in mind.

Here are a few scenarios, depending on your requirements.

Under assumption that the 100,000 rows in the CSV are individual "work 
items", and the whole CSV of them is a "batch of work", you  can, on 
upload, simply create a queue of 100,000 things to process. Also, hold a 
hashmap of these work items, so you can address them by the batch. Also 
have a work dispatch protocol and a node Microservice that takes the rows 
to processing, one by one, and take them off this batch queue into another 
queue, "pending batch finish". Once it is all completed, mark those 
"pending finish" as "completely completed" and clear the batch. Expose a 
"Cancel batch work" functionality to user, if they click "cancel current 
batch", you clean up all the pending tasks so that your worker microservice 
stops processing these. Also, mark the batch as canceled, so you can clean 
up the work items that were already completed.

If you can't break processing into items (maybe you're aggregating things 
over those 100,000 rows), perhaps you can run this aggregation in an 
interruptible loop. Then your "Cancel batch work" would first check if 
there are running aggregations and interrupt/abort them, then proceed as 
planned and run the new file.

Your least flexible option, e.g. you cannot stop the processing once it has 
started, is to at least provide upload queue - the first file uploaded is 
getting processed, and now you expose an endpoint where you can upload 
additional CSVs. They are just sent to server and are waiting. You can 
still cancel those "pending" CSVs and upload new ones instead, even if ytou 
can't break the main, running CSV. Then expose a simple "status" endpoint 
where you can indicate your status to the user, e.g. "processed 40,000 of 
100,000 rows, 3 CSVs pending processing".

You would have to keep all those locks and things outside the running 
process - Redis is probably the simplest to use - because you might be 
running these tasks (upload, processing, statuses) on different servers, or 
at the very least, different workers in your Node cluster instance.

But these are very rough guesses. With a lot more details, I could provide 
a better overview. Shoot if you have other questions.



-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
To post to this group, send email to nodejs@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/a4cb47fd-bec1-41b8-bc28-1d525d457d20%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to