On Sat, Jun 14, 2014 at 2:13 AM, Antoine Drochon (perso) <anto...@drochon.net> wrote:
> I am running into an disk space issue when I run a parallel command (GNU > parallel 20140322). > > The pseudo code is as defined below: Please do not use pseudo code, but make a working example that shows the problem as per Reporting bugs in the man page: Your bug report should always include: · The error message you get (if any). · The complete output of parallel --version. If you are not running the latest released version you should specify why you believe the problem is not fixed in that version. · A complete example that others can run that shows the problem. This should preferably be small and simple. A combination of yes, seq, cat, echo, and sleep can reproduce most errors. If your example requires large files, see if you can make them by something like seq 1000000 > file or yes | head -n 10000000 > file. If your example requires remote execution, see if you can use localhost. · The output of your example. If your problem is not easily reproduced by others, the output might help them figure out the problem. · Whether you have watched the intro videos (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked through the tutorial (man parallel_tutorial), and read the EXAMPLE section in the man page (man parallel - search for EXAMPLE:). > The Bash script perform a dig command, some pure Bash instructions and write > a single line of 50 to 100 characters to stdout. Then that should never use GB of data on /tmp. You can try using '--results outdir'. This will create the same files in outdir as in /tmp, but will not remove them. > I interrupted the execution and I assume Parallel trapped properly the signal > to cleanup the temporary directory. I got back the 15 GB. > > Note: I was unable to see any temporary file in the tmpdir directory. This is a feature: GNU Parallel uses tempfiles that are removed immediately, but kept open. This way no matter how GNU Parallel may die, the cleanup will be done by the OS. The unfortunate surprising effect of this is that your disk may run full, but you cannot see any files taking up the space. > Any idea what could cause such a big temporary buffer output usage? The only thing that comes to mind is if the output contains loads of non-printable characters (e.g. \r or \0). With --results you should be able to see how big the different files are for different arguments. If you discover that the output is actually correct (and that it takes up 15 GB), then --compress might help you. /Ole