i may found the root cause in my case: public void materializeAt(SourceTarget<S> sourceTarget) { this.materializedAt = sourceTarget; this.size = materializedAt.getSize(getPipeline().getConfiguration()); }
@Override public long getSize() { if (size < 0) { this.size = getSizeInternal(); } return size; } PColletionImpl.materializeAt(sourceTarget) this method will be invoked when node splits to create temporary table, source sourceTarget binds with the new temporary table whose size is 0, since its path was just created, the this.size will be 0. After that, when getSize() was invoked by setting reduce number, since the size is 0, it will just return 0, which makes reduce number too small. So i think the code of materializeAt() should check sourceTarget's size, like below: public void materializeAt(SourceTarget<S> sourceTarget) { this.materializedAt = sourceTarget; long size = materializedAt.getSize(getPipeline().getConfiguration()); if (size > 0) this.size = size; } 2016-10-17 11:19 GMT+08:00 David Ortiz <dpo5...@gmail.com>: > That gets tricky if you have input data that is heavily filtered though. > Perhaps play around with the scale factor on operations that may blow up > data? > > On Sun, Oct 16, 2016, 10:04 PM 陈竞 <cj.mag...@gmail.com> wrote: > >> that's a solution, but, since user may not clearly know whic step will >> produce tempoary table, i think setting reduce number automatically will >> improve user experience. I think maybe we can set reduce number as 1/3 >> mapper number before submitting jobs if one of the job inputs is temporary >> table. >> >> 2016-10-14 18:59 GMT+08:00 David Ortiz <dpo5...@gmail.com>: >> >> You can manually set the reducer number using the conf object among other >> things. >> >> On Fri, Oct 14, 2016, 5:43 AM 陈竞 <cj.mag...@gmail.com> wrote: >> >> hi, i found that if the pipeline produce temporary table , the reduce >> number of the temporary table whose input table is temporary table become >> to small, since temporary table has no content . >> >> >> >> >> -- >> 陈竞,中科院计算技术研究所,高性能计算机中心 >> Jing Chen HPCC.ICT.AC China >> > -- 陈竞,中科院计算技术研究所,高性能计算机中心 Jing Chen HPCC.ICT.AC China