[Hadoop-streaming] How to handle very long list of input files

Abdul Qadeer Thu, 20 Aug 2020 18:40:22 -0700

Hi,

Lets say I have so many files to provide in the "-input" switch of
the Hadoop streaming that I hit the shell's command-line length
limit (too many args error).


I can't move my input files into a new directory (they might be in
use by someone else), or copy them into a new directory (due to
performance reason.  Too many big files), can't use a regular
expression on input file names (there might arrive some files that fit
the re, but I don't want to include them in processing yet).

*So my question is*: Is there a way in Hadoop Steaming to handle
above scenario (for example: providing a local file containing a
long list of HDFS files, instead of directly writing file names on the
command line?)


Thanks,
-Abdul

[Hadoop-streaming] How to handle very long list of input files

Reply via email to