issue: crunch-624. link: https://issues.apache.org/jira/browse/CRUNCH-624?jql=project%20%3D%20CRUNCH
2016-10-18 13:54 GMT+08:00 Josh Wills <[email protected]>: > Yep, that's right-- can you file a JIRA, and I'll post the patch? > > On Mon, Oct 17, 2016 at 10:52 PM, 陈竞 <[email protected]> wrote: > >> i may found the root cause in my case: >> >> public void materializeAt(SourceTarget<S> sourceTarget) { >> this.materializedAt = sourceTarget; >> this.size = materializedAt.getSize(getPipeline().getConfiguration()); >> } >> >> >> @Override >> public long getSize() { >> if (size < 0) { >> this.size = getSizeInternal(); >> } >> return size; >> } >> >> PColletionImpl.materializeAt(sourceTarget) this method will be invoked >> when node splits to create temporary table, source sourceTarget binds >> with the new temporary table whose size is 0, since its path was just >> created, the this.size will be 0. After that, when getSize() was invoked by >> setting reduce number, since the size is 0, it will just return 0, which >> makes reduce number too small. >> >> So i think the code of materializeAt() should check sourceTarget's size, >> like below: >> >> public void materializeAt(SourceTarget<S> sourceTarget) { >> this.materializedAt = sourceTarget; >> long size = materializedAt.getSize(getPipeline().getConfiguration()); >> >> if (size > 0) >> >> this.size = size; >> >> } >> >> >> >> 2016-10-17 11:19 GMT+08:00 David Ortiz <[email protected]>: >> >>> That gets tricky if you have input data that is heavily filtered >>> though. Perhaps play around with the scale factor on operations that may >>> blow up data? >>> >>> On Sun, Oct 16, 2016, 10:04 PM 陈竞 <[email protected]> wrote: >>> >>>> that's a solution, but, since user may not clearly know whic step will >>>> produce tempoary table, i think setting reduce number automatically will >>>> improve user experience. I think maybe we can set reduce number as 1/3 >>>> mapper number before submitting jobs if one of the job inputs is temporary >>>> table. >>>> >>>> 2016-10-14 18:59 GMT+08:00 David Ortiz <[email protected]>: >>>> >>>> You can manually set the reducer number using the conf object among >>>> other things. >>>> >>>> On Fri, Oct 14, 2016, 5:43 AM 陈竞 <[email protected]> wrote: >>>> >>>> hi, i found that if the pipeline produce temporary table , the reduce >>>> number of the temporary table whose input table is temporary table become >>>> to small, since temporary table has no content . >>>> >>>> >>>> >>>> >>>> -- >>>> 陈竞,中科院计算技术研究所,高性能计算机中心 >>>> Jing Chen HPCC.ICT.AC China >>>> >>> >> >> >> -- >> 陈竞,中科院计算技术研究所,高性能计算机中心 >> Jing Chen HPCC.ICT.AC China >> > > -- 陈竞,中科院计算技术研究所,高性能计算机中心 Jing Chen HPCC.ICT.AC China
