Hi,
I want execute the wordcount in yarn with compression enabled with a dir with several files, but for that I must compress the input.
dir1/file1.txt dir1/file2.txt dir1/file3.txt dir1/file4.txt dir1/file5.txt 1 - Should I compress the whole dir or each file in the dir? 2 - Should I use gzip or bzip2? 3 - Do I need to setup any yarn configuration file?4 - when the job is running, the files are decompressed before running the mappers and compressed again after reducers executed?
-- Thanks,
