Thanks Bikas for your inputs! Could you tell me the property that needs to be set in order to enable tez grouping?
Thanks and regards, Nitin On Thu, Apr 21, 2016 at 11:26 PM, Bikas Saha <[email protected]> wrote: > Tez grouping (if enabled) is explained here. > https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works > > > > For the rest of the questions, the hive user mailing list would be a > better avenue for answers. > > > > Bikas > > > > *From:* Nitin Kumar [mailto:[email protected]] > *Sent:* Wednesday, April 20, 2016 10:54 PM > *To:* [email protected] > *Subject:* Managing input split sizes in Hive running the tez engine > > > > Hi, > > I want to gain a better understanding of how in the input splits are > calculated in the tez engine. > > > I am aware that the *hive.input.format* property can be set to either > *HiveInputFormat* (default) or to *CombineHiveInputFormat* (generally > accepted for large number of files having sizes << hdfs block size). > > I was hoping someone could walk me through the differences on how > *HiveInputFormat* and *CombineHiveInputFormat* calculate split sizes as > data file sizes vary from small (lesser than a block) to large (spanning > multiple blocks). > > I want to dictate the number of mapper tasks that are spawned for scanning > a table. For the MR engine this can be controlled by setting the > *mapred.min.split.size* and *mapred.max.split.size* properties. I need to > know if there are similar configurations for the tez engine. > > > > Also the properties *tez.grouping.max-size*, *tez.grouping.min-size* and > *tez.grouping.split-waves > *have been set to the values of 1GB, 16MB and 1.7 respectively. However I > observed that the created input splits do not adhere to these properties. > > I had two files of size 3MB each for a table. According to the set > properties, only 1 mapper task should have spawned but 2 mapper tasks > spawned instead. > > Are there other properties in hive/tez that need to be set to enable input > split grouping? > > I would highly appreciate your inputs. > > Thanks and regards, > > Nitin > > > > >
