Assuming you always read data together one large file is good and basic hdfs use case
On Tue, 5 Nov 2019 at 4:28 am, Yaniv Harpaz <yaniv.har...@gmail.com> wrote: > It depends on your usage (when and how u read). > the smaller files you were thinking about are also larger than the HDFS > block size? > I would not go for something smaller than a block. > > Usually (if relevant to the way you read the data) the partitioning helps > determine that. > > > Yaniv Harpaz > [ yaniv.harpaz at gmail.com ] > > > On Mon, Nov 4, 2019 at 7:03 PM Sam <games2013....@gmail.com> wrote: > >> Hi, >> >> How do we choose between single large avro file (size much larger than >> HDFS block size) vs multiple smaller avro files (close to HDFS block size? >> >> Since avro is splittable, is there even a need to split a very large avro >> file into smaller files? >> >> I’m assuming that a single large avro file can also be split into >> multiple mappers/reducers/executors during processing. >> >> Thanks. >> > -- Best Regards, Ayan Guha