No, an example could be that records have a variable number of lines, if you would then allow a file to be split your record may be broken, so then you could override isSplittable to be always false.
2014-02-26 11:22 GMT+01:00 Sugandha Naolekar <[email protected]>: > So basically what I can deduce from it is, isSplittable() only applies to > stream compressed files. Right? > > -- > Thanks & Regards, > Sugandha Naolekar > > > > > > On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <[email protected]> wrote: > >> Hi Sugandha, >> >> Take gz file as an example, It is not splittable because of the >> compression algorithm it is used. It can not guarantee that one record is >> located in one block, if one record is in 2 blocks, your program will crash >> since you can not get the whole record. >> >> >> >> >> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar < >> [email protected]> wrote: >> >>> Hello, >>> >>> If a single file is split of size 129 MB is split in two halves/blocks >>> of HDFS as the max block size id 128 MB. And each of the blocks is read >>> depending on the InputFormat it supports. Thus, what is the significance of >>> isSplittable() method then? >>> >>> If it is set to false, entire block will be considered as single input >>> split? How will TextInputFormat react to it? >>> >>> >>> -- >>> Thanks & Regards, >>> Sugandha Naolekar >>> >>> >>> >>> >> >
