Hi slurm dev, I'm a bioinformatician from University of Chicago. I've setup a SLURM cluster on bionimbus open-science-data-cloud. Right now I'm facing a task to compute ~20TB compressed input data and generate the same amount of output data. For input data, there're ~30 large files. The computation is about merging them into one giant cohort by genome chunk scattering, so that the outputs are around 120 files (this number is flexible, and only depends on scattering process). Due to the fact that I don't have such a big flavor (~40TB ephemeral space) to hold the data, I start to seek any possible way to compute. I now have 20 nodes on my SLURM cluster, each one has the same flavor with 4TB ephemeral space. I'm wondering if it's possible for SLURM to consider multiple nodes as one, handle data across the nodes, so that I could just split input data to all the nodes and the SLURM job would take inputs from different nodes and generate outputs on whichever node has space left. I apologize for my innocence and craziness, and appreciate any advices from you guys. Sorry for the bother, and thank you all!
Thanks, Shenglai