Oh. Ok. Thanks. So basically, to be on the safer side, one can always set its value as false and keep the data of records consistent. I mean, the length of all the records should be the same.
-- Thanks & Regards, Sugandha Naolekar On Wed, Feb 26, 2014 at 3:57 PM, Dieter De Witte <[email protected]> wrote: > No, an example could be that records have a variable number of lines, if > you would then allow a file to be split your record may be broken, so then > you could override isSplittable to be always false. > > > 2014-02-26 11:22 GMT+01:00 Sugandha Naolekar <[email protected]>: > > So basically what I can deduce from it is, isSplittable() only applies to >> stream compressed files. Right? >> >> -- >> Thanks & Regards, >> Sugandha Naolekar >> >> >> >> >> >> On Wed, Feb 26, 2014 at 2:06 PM, Jeff Zhang <[email protected]>wrote: >> >>> Hi Sugandha, >>> >>> Take gz file as an example, It is not splittable because of the >>> compression algorithm it is used. It can not guarantee that one record is >>> located in one block, if one record is in 2 blocks, your program will crash >>> since you can not get the whole record. >>> >>> >>> >>> >>> On Wed, Feb 26, 2014 at 1:24 PM, Sugandha Naolekar < >>> [email protected]> wrote: >>> >>>> Hello, >>>> >>>> If a single file is split of size 129 MB is split in two halves/blocks >>>> of HDFS as the max block size id 128 MB. And each of the blocks is read >>>> depending on the InputFormat it supports. Thus, what is the significance of >>>> isSplittable() method then? >>>> >>>> If it is set to false, entire block will be considered as single input >>>> split? How will TextInputFormat react to it? >>>> >>>> >>>> -- >>>> Thanks & Regards, >>>> Sugandha Naolekar >>>> >>>> >>>> >>>> >>> >> >
