If you are able to use a modern Java implementation, you can use pure-Java streams, eg:
https://stackoverflow.com/a/66044221 /// Files.walk(Paths.get("/path/to/root/directory")) // create a stream of paths .collect(Collectors.toList()) // collect paths into list to better parallize .parallelStream() // process this stream in multiple threads .filter(Files::isRegularFile) // filter out any non-files (such as directories) .map(Path::toFile) // convert Path to File object .sorted((a, b) -> Long.compare(a.lastModified(), b.lastModified())) // sort files date .limit(500) // limit processing to 500 files (optional) .forEachOrdered(f -> { // do processing here System.out.println(f); }); /// also read : https://www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams Hope this helps some. BOB From: Merlin Beedell <mbeed...@cryoserver.com> Sent: Monday, 9 May 2022 8:12 PM To: users@groovy.apache.org Subject: Design pattern for processing a huge directory tree of files using GPars I am trying to process millions of files, spread over a tree of directories. At the moment I can collect the set of top level directories into a list and then process these in parallel using GPars with list processing (e.g. .eachParallel). But what would be more efficient would be a 'parallel' for the File handling routines, for example: withPool() { directory.eachFileMatchParallel (FILES, ~/($fileMatch)/) {aFile -> ... then I would be a very happy bunny! I know I could copy the list of matching files into an Array list and then use the withPool { filesArray.eachParallel { ... - but this does not seem like an efficient solution - especially if there are several hundred thousand files in a directory. What design pattern(s) might be better to consider using? Merlin Beedell