Yep, that's right-- can you file a JIRA, and I'll post the patch?

On Mon, Oct 17, 2016 at 10:52 PM, 陈竞 <cj.mag...@gmail.com> wrote:

> i may found the root cause in my case:
>
> public void materializeAt(SourceTarget<S> sourceTarget) {
>   this.materializedAt = sourceTarget;
>   this.size = materializedAt.getSize(getPipeline().getConfiguration());
> }
>
>
> @Override
> public long getSize() {
>     if (size < 0) {
>         this.size = getSizeInternal();
>     }
>     return size;
> }
>
> PColletionImpl.materializeAt(sourceTarget) this method will be invoked
> when node splits to create temporary table, source sourceTarget binds
> with the new temporary table whose size is 0, since its path was just
> created, the this.size will be 0. After that, when getSize() was invoked by
> setting reduce number, since the size is 0, it will just return 0, which
> makes reduce number too small.
>
> So i think the code of materializeAt() should check sourceTarget's size, like 
> below:
>
> public void materializeAt(SourceTarget<S> sourceTarget) {
>   this.materializedAt = sourceTarget;
>   long size = materializedAt.getSize(getPipeline().getConfiguration());
>
>   if (size > 0)
>
>       this.size = size;
>
> }
>
>
>
> 2016-10-17 11:19 GMT+08:00 David Ortiz <dpo5...@gmail.com>:
>
>> That gets tricky if you have input data that is heavily filtered though.
>> Perhaps play around with the scale factor on operations that may blow up
>> data?
>>
>> On Sun, Oct 16, 2016, 10:04 PM 陈竞 <cj.mag...@gmail.com> wrote:
>>
>>> that's  a solution, but, since user may not clearly know whic step will
>>> produce tempoary table, i think setting reduce number  automatically will
>>> improve user experience. I think maybe we can set reduce number as 1/3
>>> mapper number before submitting jobs if one of the job inputs is temporary
>>> table.
>>>
>>> 2016-10-14 18:59 GMT+08:00 David Ortiz <dpo5...@gmail.com>:
>>>
>>> You can manually set the reducer number using the conf object among
>>> other things.
>>>
>>> On Fri, Oct 14, 2016, 5:43 AM 陈竞 <cj.mag...@gmail.com> wrote:
>>>
>>> hi, i found that if the pipeline produce temporary table , the reduce
>>> number of the temporary table whose input table is temporary table  become
>>> to small, since temporary table has no content .
>>>
>>>
>>>
>>>
>>> --
>>> 陈竞,中科院计算技术研究所,高性能计算机中心
>>> Jing Chen HPCC.ICT.AC China
>>>
>>
>
>
> --
> 陈竞,中科院计算技术研究所,高性能计算机中心
> Jing Chen HPCC.ICT.AC China
>

Reply via email to