Hey,
We were doing some testing using EMR and found that Crunch has an issue
with Hadoop 2.8.5. Specifically, it comes from the changes made for
CRUNCH-679. It appears that the CrunchRenameCopyListing class was updated to
use the “shouldCopy” method of SimpleFileListing.
if (!shouldCopy(fileStatus.getPath(), options)) {
return;
}
Looking at the history of the SimpleFileListing class in hadoop-distcp, it
appears that from version 2.8.0 on, it no longer takes DistCp options as a
parameter, which causes jobs using CrunchRenameCopyListing on versions of
Hadoop at that point on to fail with a NoSuchMethodError. I’ve attempted
working around it by pulling the older hadoop-distcp package in using the
“packages” argument with Spark, but it does not seem to have helped. Any other
suggestions to try?
Thanks,
Dave