I read about CompositeInputFormat and how it allows one to join two
datasets together as long as those datasets were sorted and partitioned the
same way.
Ok i think i get it, but something bothers me. It is suggested that two
datasets are "sorted and partitioned the same way" if they were both
outp
Your understanding is correct. The framework doesn't do anything to
align input splits across datasets. In the situation you describe-
where one can't seek among key groups in the input data- it often
makes sense to disable splitting of the individual files by setting
the min split size to Integer.