Hey Jeff, so one scenario i recently encountered was an job on about 300.000 files in hdfs. The splitting alone took 21 minutes. So i thought until the splitting is completed completely the a lot of splits could have already been processed…
thanks for you answer! Johannes > On 12 Mar 2015, at 10:51, Jianfeng (Jeff) Zhang <[email protected]> > wrote: > > > HI Johannes, > > If the input-initlizeer is not done, workers can not be started. > What¹s your scenario ? Why do you want to start the workers before > splitting is generated ? Just save the launch time or let the worker to do > other stuff ? > > > Best Regard, > Jeff Zhang > > > > > > On 3/12/15, 5:38 PM, "Johannes Zillmann" <[email protected]> wrote: > >> Hey guys, >> >> dump question. With Tez can i have a input-initializaer which don¹t >> require to create every split before starting the processing of already >> created splits ? >> Means if i have a lot of splits and my splitting process takes a long >> time, can the workers start working already while still doing the >> splitting ? >> >> Johannes >
